trulens.core.session¶

trulens.core.session ¶

Classes¶

TruSession ¶

Bases: _WithExperimentalSettings, PydanticSingleton

TruSession is the main class that provides an entry points to trulens.

TruSession lets you:

Log app prompts and outputs
Log app Metadata
Run and log feedback functions
Run streamlit dashboard to view experiment results

By default, all data is logged to the current working directory to "default.sqlite". Data can be logged to a SQLAlchemy-compatible url referred to by database_url.

Supported App Types

TruChain: Langchain apps.

TruLlama: Llama Index apps.

TruRails: NeMo Guardrails apps.

TruBasicApp: Basic apps defined solely using a function from str to str.

[TruApp][trulens.apps.app.TruApp]: Custom apps containing custom structures and methods. Requires annotation of methods to instrument.

TruVirtual: Virtual apps that do not have a real app to instrument but have a virtual structure and can log existing captured data as if they were trulens records.

PARAMETER	DESCRIPTION
`connector`	Database Connector to use. If not provided, a default DefaultDBConnector is created. TYPE: `Optional[DBConnector]` DEFAULT: `None`
`experimental_feature_flags`	Experimental feature flags. TYPE: `Optional[Union[Mapping[Feature, bool], List[Feature]]]` DEFAULT: `None`
`**kwargs`	All other arguments are used to initialize DefaultDBConnector. Mutually exclusive with `connector`. DEFAULT: `{}`

Attributes¶

RETRY_RUNNING_SECONDS `class-attribute` `instance-attribute` ¶

RETRY_RUNNING_SECONDS: float = 60.0

How long to wait (in seconds) before restarting a feedback function that has already started

A feedback function execution that has started may have stalled or failed in a bad way that did not record the failure.

RETRY_FAILED_SECONDS `class-attribute` `instance-attribute` ¶

RETRY_FAILED_SECONDS: float = 5 * 60.0

How long to wait (in seconds) to retry a failed feedback function run.

DEFERRED_NUM_RUNS `class-attribute` `instance-attribute` ¶

DEFERRED_NUM_RUNS: int = 32

Number of futures to wait for when evaluating deferred feedback functions.

RECORDS_BATCH_TIMEOUT_IN_SEC `class-attribute` `instance-attribute` ¶

RECORDS_BATCH_TIMEOUT_IN_SEC: int = 10

Time to wait before inserting a batch of records into the database.

GROUND_TRUTHS_BATCH_SIZE `class-attribute` `instance-attribute` ¶

GROUND_TRUTHS_BATCH_SIZE: int = 100

Time to wait before inserting a batch of ground truths into the database.

connector `class-attribute` `instance-attribute` ¶

connector: Optional[DBConnector] = Field(None, exclude=True)

Database Connector to use. If not provided, a default is created and used.

experimental_otel_exporter `property` ¶

experimental_otel_exporter: Optional[SpanExporter]

EXPERIMENTAL(otel_tracing): OpenTelemetry SpanExporter to send spans to.

Only works if the trulens.core.experimental.Feature.OTEL_TRACING flag is set. The setter will set and lock the flag as enabled.

Functions¶

force_flush ¶

force_flush(timeout_millis: int = 300000) -> bool

Force flush the OpenTelemetry exporters.

PARAMETER	DESCRIPTION
`timeout_millis`	The maximum amount of time to wait for spans to be processed. TYPE: `int` DEFAULT: `300000`

RETURNS	DESCRIPTION
`bool`	False if the timeout is exceeded, feature is not enabled, or the provider doesn't exist, True otherwise.

App ¶

App(*args, app: Optional[Any] = None, **kwargs) -> App

Create an App from the given App constructor arguments by guessing which app type they refer to.

This method intentionally prints out the type of app being created to let user know in case the guess is wrong.

Basic ¶

Basic(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Custom ¶

Custom(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Virtual ¶

Virtual(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Chain ¶

Chain(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Llama ¶

Llama(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Rails ¶

Rails(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

find_unused_port ¶

find_unused_port(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.find_unused_port instead.

run_dashboard ¶

run_dashboard(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.run_dashboard instead.

start_dashboard ¶

start_dashboard(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.run_dashboard instead.

stop_dashboard ¶

stop_dashboard(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.stop_dashboard instead.

update_record ¶

update_record(*args, **kwargs)

Deprecated

Use trulens.core.session.TruSession.connector .db.insert_record instead.

reset_database ¶

reset_database()

Reset the database. Clears all tables.

See DB.reset_database.

migrate_database ¶

migrate_database(**kwargs: Dict[str, Any])

Migrates the database.

This should be run whenever there are breaking changes in a database created with an older version of trulens.

PARAMETER	DESCRIPTION
`**kwargs`	Keyword arguments to pass to migrate_database of the current database. TYPE: `Dict[str, Any]` DEFAULT: `{}`

See DB.migrate_database.

add_record ¶

add_record(
    record: Optional[Record] = None, **kwargs: dict
) -> RecordID

Add a record to the database.

PARAMETER	DESCRIPTION
`record`	The record to add. TYPE: `Optional[Record]` DEFAULT: `None`
`**kwargs`	Record fields to add to the given record or a new record if no `record` provided. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`RecordID`	Unique record identifier str .

add_record_nowait ¶

add_record_nowait(record: Record) -> None

Add a record to the queue to be inserted in the next batch.

run_feedback_functions ¶

run_feedback_functions(
    record: Record,
    feedback_functions: Sequence[Feedback],
    app: Optional[AppDefinition] = None,
    wait: bool = True,
) -> Union[
    Iterable[FeedbackResult],
    Iterable[Future[FeedbackResult]],
]

Run a collection of feedback functions and report their result.

PARAMETER	DESCRIPTION
`record`	The record on which to evaluate the feedback functions. TYPE: `Record`
`app`	The app that produced the given record. If not provided, it is looked up from the given database `db`. TYPE: `Optional[AppDefinition]` DEFAULT: `None`
`feedback_functions`	A collection of feedback functions to evaluate. TYPE: `Sequence[Feedback]`
`wait`	If set (default), will wait for results before returning. TYPE: `bool` DEFAULT: `True`

YIELDS	DESCRIPTION
`Union[Iterable[FeedbackResult], Iterable[Future[FeedbackResult]]]`	One result for each element of `feedback_functions` of FeedbackResult if `wait` is enabled (default) or Future of FeedbackResult if `wait` is disabled.

add_app ¶

add_app(app: AppDefinition) -> AppID

Add an app to the database and return its unique id.

PARAMETER	DESCRIPTION
`app`	The app to add to the database. TYPE: `AppDefinition`

RETURNS	DESCRIPTION
`AppID`	A unique app identifier str.

delete_app ¶

delete_app(app_id: AppID) -> None

Deletes an app from the database based on its app_id.

PARAMETER	DESCRIPTION
`app_id`	The unique identifier of the app to be deleted. TYPE: `AppID`

add_feedback ¶

add_feedback(
    feedback_result_or_future: Optional[
        Union[FeedbackResult, Future[FeedbackResult]]
    ] = None,
    **kwargs: dict
) -> FeedbackResultID

Add a single feedback result or future to the database and return its unique id.

PARAMETER	DESCRIPTION
`feedback_result_or_future`	If a Future is given, call will wait for the result before adding it to the database. If `kwargs` are given and a FeedbackResult is also given, the `kwargs` will be used to update the FeedbackResult otherwise a new one will be created with `kwargs` as arguments to its constructor. TYPE: `Optional[Union[FeedbackResult, Future[FeedbackResult]]]` DEFAULT: `None`
`**kwargs`	Fields to add to the given feedback result or to create a new FeedbackResult with. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`FeedbackResultID`	A unique result identifier str.

add_feedbacks ¶

add_feedbacks(
    feedback_results: Iterable[
        Union[FeedbackResult, Future[FeedbackResult]]
    ]
) -> List[FeedbackResultID]

Add multiple feedback results to the database and return their unique ids.

PARAMETER	DESCRIPTION
`feedback_results`	An iterable with each iteration being a FeedbackResult or Future of the same. Each given future will be waited. TYPE: `Iterable[Union[FeedbackResult, Future[FeedbackResult]]]`

RETURNS	DESCRIPTION
`List[FeedbackResultID]`	List of unique result identifiers str in the same order as input `feedback_results`.

get_app ¶

get_app(app_id: AppID) -> Optional[JSONized[AppDefinition]]

Look up an app from the database.

This method produces the JSON-ized version of the app. It can be deserialized back into an AppDefinition with model_validate:

Example

from trulens.core.schema import app
app_json = session.get_app(app_id="app_hash_85ebbf172d02e733c8183ac035d0cbb2")
app = app.AppDefinition.model_validate(app_json)

Warning

Do not rely on deserializing into App as its implementations feature attributes not meant to be deserialized.

PARAMETER	DESCRIPTION
`app_id`	The unique identifier str of the app to look up. TYPE: `AppID`

RETURNS	DESCRIPTION
`Optional[JSONized[AppDefinition]]`	JSON-ized version of the app.

get_apps ¶

get_apps() -> List[JSONized[AppDefinition]]

Look up all apps from the database.

RETURNS	DESCRIPTION
`List[JSONized[AppDefinition]]`	A list of JSON-ized version of all apps in the database.

Warning

Same Deserialization caveats as get_app.

get_records_and_feedback ¶

get_records_and_feedback(
    app_ids: Optional[List[AppID]] = None,
    app_name: Optional[AppName] = None,
    app_version: Optional[AppVersion] = None,
    app_versions: Optional[List[AppVersion]] = None,
    run_name: Optional[RunName] = None,
    record_ids: Optional[List[RecordID]] = None,
    offset: Optional[int] = None,
    limit: Optional[int] = None,
) -> Tuple[DataFrame, List[str]]

Get records, their feedback results, and feedback names.

PARAMETER	DESCRIPTION
`app_ids`	A list of app ids to filter records by. If empty or not given, all apps' records will be returned. TYPE: `Optional[List[AppID]]` DEFAULT: `None`
`app_name`	A name of the app to filter records by. If given, only records for this app will be returned. TYPE: `Optional[AppName]` DEFAULT: `None`
`app_version`	A version of the app to filter records by. If given, only records for this app version will be returned. TYPE: `Optional[AppVersion]` DEFAULT: `None`
`app_versions`	A list of app versions to filter records by. If given, only records for these app versions will be returned. TYPE: `Optional[List[AppVersion]]` DEFAULT: `None`
`run_name`	A run name to filter records by. If given, only records for this run will be returned. TYPE: `Optional[RunName]` DEFAULT: `None`
`record_ids`	An optional list of record ids to filter records by. TYPE: `Optional[List[RecordID]]` DEFAULT: `None`
`offset`	Record row offset. TYPE: `Optional[int]` DEFAULT: `None`
`limit`	Limit on the number of records to return. TYPE: `Optional[int]` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	Tuple of:
`List[str]`	DataFrame of records with their feedback results.
`Tuple[DataFrame, List[str]]`	List of feedback names that are columns in the DataFrame.

get_leaderboard ¶

get_leaderboard(
    app_ids: Optional[List[AppID]] = None,
    group_by_metadata_key: Optional[str] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
) -> DataFrame

Get a leaderboard for the given apps.

PARAMETER	DESCRIPTION
`app_ids`	A list of app ids to filter records by. If empty or not given, all apps will be included in leaderboard. TYPE: `Optional[List[AppID]]` DEFAULT: `None`
`group_by_metadata_key`	A key included in record metadata that you want to group results by. TYPE: `Optional[str]` DEFAULT: `None`
`limit`	Limit on the number of records to aggregate to produce the leaderboard. TYPE: `Optional[int]` DEFAULT: `None`
`offset`	Record row offset to select which records to use to aggregate the leaderboard. TYPE: `Optional[int]` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	Dataframe of apps with their feedback results aggregated.
`DataFrame`	If group_by_metadata_key is provided, the dataframe will be grouped by the specified key.

add_ground_truth_to_dataset ¶

add_ground_truth_to_dataset(
    dataset_name: str,
    ground_truth_df: DataFrame,
    dataset_metadata: Optional[Dict[str, Any]] = None,
)

Create a new dataset, if not existing, and add ground truth data to it. If the dataset with the same name already exists, the ground truth data will be added to it.

PARAMETER	DESCRIPTION
`dataset_name`	Name of the dataset. TYPE: `str`
`ground_truth_df`	DataFrame containing the ground truth data. TYPE: `DataFrame`
`dataset_metadata`	Additional metadata to add to the dataset. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`

get_ground_truth ¶

get_ground_truth(
    dataset_name: Optional[str] = None,
    user_table_name: Optional[str] = None,
    user_schema_mapping: Optional[Dict[str, str]] = None,
    user_schema_name: Optional[str] = None,
) -> DataFrame

Get ground truth data from the dataset. If user_table_name and user_schema_mapping are provided, load a virtual dataset from the user's table using the schema mapping. If dataset_name is provided, load ground truth data from the dataset by name. dataset_name: Name of the dataset. user_table_name: Name of the user's table to load ground truth data from. user_schema_mapping: Mapping of user table columns to internal GroundTruth schema fields. user_schema_name: Name of the user's schema to load ground truth data from.

start_evaluator ¶

start_evaluator(
    restart: bool = False,
    fork: bool = False,
    disable_tqdm: bool = False,
    run_location: Optional[FeedbackRunLocation] = None,
    return_when_done: bool = False,
) -> Optional[Union[Process, Thread]]

Start a deferred feedback function evaluation thread or process.

PARAMETER	DESCRIPTION
`restart`	If set, will stop the existing evaluator before starting a new one. TYPE: `bool` DEFAULT: `False`
`fork`	If set, will start the evaluator in a new process instead of a thread. NOT CURRENTLY SUPPORTED. TYPE: `bool` DEFAULT: `False`
`disable_tqdm`	If set, will disable progress bar logging from the evaluator. TYPE: `bool` DEFAULT: `False`
`run_location`	Run only the evaluations corresponding to run_location. TYPE: `Optional[FeedbackRunLocation]` DEFAULT: `None`
`return_when_done`	Instead of running asynchronously, will block until no feedbacks remain. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`Optional[Union[Process, Thread]]`	If return_when_done is True, then returns None. Otherwise, the started process or thread that is executing the deferred feedback evaluator.

Relevant constants

RETRY_RUNNING_SECONDS

RETRY_FAILED_SECONDS

DEFERRED_NUM_RUNS

MAX_THREADS

stop_evaluator ¶

stop_evaluator()

Stop the deferred feedback evaluation thread.

wait_for_records ¶

wait_for_records(
    record_ids: List[str],
    timeout: float = 10,
    poll_interval: float = 0.5,
) -> None

Wait for specific record_ids to appear in the TruLens session.

PARAMETER	DESCRIPTION
`record_ids`	The record ids to wait for. TYPE: `List[str]`
`timeout`	Maximum time to wait in seconds. TYPE: `float` DEFAULT: `10`
`poll_interval`	How often to poll in seconds. TYPE: `float` DEFAULT: `0.5`

add_feedback_result ¶

add_feedback_result(
    record: Record,
    feedback_name: str,
    feedback_result: Union[float, int],
    higher_is_better: bool,
) -> None

Add a feedback result for a given record.

PARAMETER	DESCRIPTION
`record`	The Record object to add feedback for. TYPE: `Record`
`feedback_name`	The name of the feedback function. TYPE: `str`
`feedback_result`	The feedback score/result (float or int). TYPE: `Union[float, int]`
`higher_is_better`	Whether higher values are better. TYPE: `bool`

compute_feedbacks_on_events ¶

compute_feedbacks_on_events(
    events: DataFrame,
    feedbacks: List[Feedback],
    raise_error_on_no_feedbacks_computed: bool = False,
) -> None

Compute feedbacks/metrics on events.

PARAMETER	DESCRIPTION
`events`	Events to compute feedbacks on. This can be from multiple records. TYPE: `DataFrame`
`feedbacks`	Feedback functions to compute. TYPE: `List[Feedback]`
`raise_error_on_no_feedbacks_computed`	Raise an error if no feedbacks were computed. Default is False. TYPE: `bool` DEFAULT: `False`

get_events ¶

get_events(
    app_name: Optional[str],
    app_version: Optional[str],
    record_ids: Optional[List[str]] = None,
    start_time: Optional[datetime] = None,
) -> DataFrame

Get events/spans from the database in OTel mode.

PARAMETER	DESCRIPTION
`app_name`	The app name to filter events by. TYPE: `Optional[str]`
`app_version`	The app version to filter events by. TYPE: `Optional[str]`
`record_ids`	The record ids to filter events by. TYPE: `Optional[List[str]]` DEFAULT: `None`
`start_time`	The minimum time to consider events from. TYPE: `Optional[datetime]` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	A pandas DataFrame of all relevant events/spans.

experimental_enable_feature ¶

experimental_enable_feature(
    flag: Union[str, Feature]
) -> bool

Enable the given feature flag.

RAISES	DESCRIPTION
`ValueError`	If the flag is already frozen to disabled.

experimental_disable_feature ¶

experimental_disable_feature(
    flag: Union[str, Feature]
) -> bool

Disable the given feature flag.

RAISES	DESCRIPTION
`ValueError`	If the flag is already frozen to enabled.

experimental_feature ¶

experimental_feature(
    flag: Union[str, Feature], *, freeze: bool = False
) -> bool

Determine the value of the given feature flag.

If freeze is set, the flag will be frozen to the value returned.

experimental_set_features ¶

experimental_set_features(
    flags: Optional[
        Union[
            Iterable[Union[str, Feature]],
            Mapping[Union[str, Feature], bool],
        ]
    ],
    freeze: bool = False,
)

Set multiple feature flags.

If freeze is set, the flags will be frozen to the values given.

RAISES	DESCRIPTION
`ValueError`	If any flag is already frozen to a different value than

trulens.core.session¶