Skip to content

trulens.core.session

trulens.core.session

Classes

TruSession

Bases: _WithExperimentalSettings, BaseModel

TruSession is the main class that provides an entry points to trulens.

TruSession lets you:

  • Log app prompts and outputs
  • Log app Metadata
  • Run and log feedback functions
  • Run streamlit dashboard to view experiment results

By default, all data is logged to the current working directory to "default.sqlite". Data can be logged to a SQLAlchemy-compatible url referred to by database_url.

Supported App Types

TruChain: Langchain apps.

TruLlama: Llama Index apps.

TruRails: NeMo Guardrails apps.

TruBasicApp: Basic apps defined solely using a function from str to str.

TruCustomApp: Custom apps containing custom structures and methods. Requires annotation of methods to instrument.

TruVirtual: Virtual apps that do not have a real app to instrument but have a virtual structure and can log existing captured data as if they were trulens records.

PARAMETER DESCRIPTION
connector

Database Connector to use. If not provided, a default DefaultDBConnector is created.

TYPE: Optional[DBConnector] DEFAULT: None

experimental_feature_flags

Experimental feature flags.

TYPE: Optional[Union[Mapping[Feature, bool], Iterable[Feature]]] DEFAULT: None

**kwargs

All other arguments are used to initialize DefaultDBConnector. Mutually exclusive with connector.

DEFAULT: {}

Attributes
RETRY_RUNNING_SECONDS class-attribute instance-attribute
RETRY_RUNNING_SECONDS: float = 60.0

How long to wait (in seconds) before restarting a feedback function that has already started

A feedback function execution that has started may have stalled or failed in a bad way that did not record the failure.

See also

start_evaluator

DEFERRED

RETRY_FAILED_SECONDS class-attribute instance-attribute
RETRY_FAILED_SECONDS: float = 5 * 60.0

How long to wait (in seconds) to retry a failed feedback function run.

DEFERRED_NUM_RUNS class-attribute instance-attribute
DEFERRED_NUM_RUNS: int = 32

Number of futures to wait for when evaluating deferred feedback functions.

RECORDS_BATCH_TIMEOUT_IN_SEC class-attribute instance-attribute
RECORDS_BATCH_TIMEOUT_IN_SEC: int = 10

Time to wait before inserting a batch of records into the database.

GROUND_TRUTHS_BATCH_SIZE class-attribute instance-attribute
GROUND_TRUTHS_BATCH_SIZE: int = 100

Time to wait before inserting a batch of ground truths into the database.

connector class-attribute instance-attribute
connector: Optional[DBConnector] = Field(None, exclude=True)

Database Connector to use. If not provided, a default is created and used.

experimental_otel_exporter property writable
experimental_otel_exporter: Any

EXPERIMENTAL(otel_tracing): OpenTelemetry SpanExporter to send spans to.

Only works if the trulens.core.experimental.Feature.OTEL_TRACING flag is set. The setter will set and lock the flag as enabled.

Functions
experimental_enable_feature
experimental_enable_feature(
    flag: Union[str, Feature]
) -> bool

Enable the given feature flag.

RAISES DESCRIPTION
ValueError

If the flag is already locked to disabled.

experimental_disable_feature
experimental_disable_feature(
    flag: Union[str, Feature]
) -> bool

Disable the given feature flag.

RAISES DESCRIPTION
ValueError

If the flag is already locked to enabled.

experimental_feature
experimental_feature(
    flag: Union[str, Feature], *, lock: bool = False
) -> bool

Determine the value of the given feature flag.

If lock is set, the flag will be locked to the value returned.

experimental_set_features
experimental_set_features(
    flags: Union[
        Iterable[Union[str, Feature]],
        Mapping[Union[str, Feature], bool],
    ],
    lock: bool = False,
)

Set multiple feature flags.

If lock is set, the flags will be locked to the values given.

RAISES DESCRIPTION
ValueError

If any flag is already locked to a different value than

App
App(*args, app: Optional[Any] = None, **kwargs) -> App

Create an App from the given App constructor arguments by guessing which app type they refer to.

This method intentionally prints out the type of app being created to let user know in case the guess is wrong.

Basic
Basic(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Custom
Custom(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Virtual
Virtual(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Chain
Chain(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Llama
Llama(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Rails
Rails(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

find_unused_port
find_unused_port(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.find_unused_port instead.

run_dashboard
run_dashboard(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.run_dashboard instead.

start_dashboard
start_dashboard(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.run_dashboard instead.

stop_dashboard
stop_dashboard(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.stop_dashboard instead.

update_record
update_record(*args, **kwargs)
reset_database
reset_database()

Reset the database. Clears all tables.

See DB.reset_database.

migrate_database
migrate_database(**kwargs: Dict[str, Any])

Migrates the database.

This should be run whenever there are breaking changes in a database created with an older version of trulens.

PARAMETER DESCRIPTION
**kwargs

Keyword arguments to pass to migrate_database of the current database.

TYPE: Dict[str, Any] DEFAULT: {}

See DB.migrate_database.

add_record
add_record(
    record: Optional[Record] = None, **kwargs: dict
) -> RecordID

Add a record to the database.

PARAMETER DESCRIPTION
record

The record to add.

TYPE: Optional[Record] DEFAULT: None

**kwargs

Record fields to add to the given record or a new record if no record provided.

TYPE: dict DEFAULT: {}

RETURNS DESCRIPTION
RecordID

Unique record identifier str .

add_record_nowait
add_record_nowait(record: Record) -> None

Add a record to the queue to be inserted in the next batch.

run_feedback_functions
run_feedback_functions(
    record: Record,
    feedback_functions: Sequence[Feedback],
    app: Optional[AppDefinition] = None,
    wait: bool = True,
) -> Union[
    Iterable[FeedbackResult],
    Iterable[Future[FeedbackResult]],
]

Run a collection of feedback functions and report their result.

PARAMETER DESCRIPTION
record

The record on which to evaluate the feedback functions.

TYPE: Record

app

The app that produced the given record. If not provided, it is looked up from the given database db.

TYPE: Optional[AppDefinition] DEFAULT: None

feedback_functions

A collection of feedback functions to evaluate.

TYPE: Sequence[Feedback]

wait

If set (default), will wait for results before returning.

TYPE: bool DEFAULT: True

YIELDS DESCRIPTION
Union[Iterable[FeedbackResult], Iterable[Future[FeedbackResult]]]

One result for each element of feedback_functions of FeedbackResult if wait is enabled (default) or Future of FeedbackResult if wait is disabled.

add_app
add_app(app: AppDefinition) -> AppID

Add an app to the database and return its unique id.

PARAMETER DESCRIPTION
app

The app to add to the database.

TYPE: AppDefinition

RETURNS DESCRIPTION
AppID

A unique app identifier str.

delete_app
delete_app(app_id: AppID) -> None

Deletes an app from the database based on its app_id.

PARAMETER DESCRIPTION
app_id

The unique identifier of the app to be deleted.

TYPE: AppID

add_feedback
add_feedback(
    feedback_result_or_future: Optional[
        Union[FeedbackResult, Future[FeedbackResult]]
    ] = None,
    **kwargs: dict
) -> FeedbackResultID

Add a single feedback result or future to the database and return its unique id.

PARAMETER DESCRIPTION
feedback_result_or_future

If a Future is given, call will wait for the result before adding it to the database. If kwargs are given and a FeedbackResult is also given, the kwargs will be used to update the FeedbackResult otherwise a new one will be created with kwargs as arguments to its constructor.

TYPE: Optional[Union[FeedbackResult, Future[FeedbackResult]]] DEFAULT: None

**kwargs

Fields to add to the given feedback result or to create a new FeedbackResult with.

TYPE: dict DEFAULT: {}

RETURNS DESCRIPTION
FeedbackResultID

A unique result identifier str.

add_feedbacks
add_feedbacks(
    feedback_results: Iterable[
        Union[FeedbackResult, Future[FeedbackResult]]
    ]
) -> List[FeedbackResultID]

Add multiple feedback results to the database and return their unique ids.

PARAMETER DESCRIPTION
feedback_results

An iterable with each iteration being a FeedbackResult or Future of the same. Each given future will be waited.

TYPE: Iterable[Union[FeedbackResult, Future[FeedbackResult]]]

RETURNS DESCRIPTION
List[FeedbackResultID]

List of unique result identifiers str in the same order as input feedback_results.

get_app
get_app(app_id: AppID) -> Optional[JSONized[AppDefinition]]

Look up an app from the database.

This method produces the JSON-ized version of the app. It can be deserialized back into an AppDefinition with model_validate:

Example
from trulens.core.schema import app
app_json = session.get_app(app_id="app_hash_85ebbf172d02e733c8183ac035d0cbb2")
app = app.AppDefinition.model_validate(app_json)
Warning

Do not rely on deserializing into App as its implementations feature attributes not meant to be deserialized.

PARAMETER DESCRIPTION
app_id

The unique identifier str of the app to look up.

TYPE: AppID

RETURNS DESCRIPTION
Optional[JSONized[AppDefinition]]

JSON-ized version of the app.

get_apps
get_apps() -> List[JSONized[AppDefinition]]

Look up all apps from the database.

RETURNS DESCRIPTION
List[JSONized[AppDefinition]]

A list of JSON-ized version of all apps in the database.

Warning

Same Deserialization caveats as get_app.

get_records_and_feedback
get_records_and_feedback(
    app_ids: Optional[List[AppID]] = None,
    offset: Optional[int] = None,
    limit: Optional[int] = None,
) -> Tuple[DataFrame, List[str]]

Get records, their feedback results, and feedback names.

PARAMETER DESCRIPTION
app_ids

A list of app ids to filter records by. If empty or not given, all apps' records will be returned.

TYPE: Optional[List[AppID]] DEFAULT: None

offset

Record row offset.

TYPE: Optional[int] DEFAULT: None

limit

Limit on the number of records to return.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
DataFrame

DataFrame of records with their feedback results.

List[str]

List of feedback names that are columns in the DataFrame.

get_leaderboard
get_leaderboard(
    app_ids: Optional[List[AppID]] = None,
    group_by_metadata_key: Optional[str] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
) -> DataFrame

Get a leaderboard for the given apps.

PARAMETER DESCRIPTION
app_ids

A list of app ids to filter records by. If empty or not given, all apps will be included in leaderboard.

TYPE: Optional[List[AppID]] DEFAULT: None

group_by_metadata_key

A key included in record metadata that you want to group results by.

TYPE: Optional[str] DEFAULT: None

limit

Limit on the number of records to aggregate to produce the leaderboard.

TYPE: Optional[int] DEFAULT: None

offset

Record row offset to select which records to use to aggregate the leaderboard.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
DataFrame

Dataframe of apps with their feedback results aggregated.

DataFrame

If group_by_metadata_key is provided, the dataframe will be grouped by the specified key.

add_ground_truth_to_dataset
add_ground_truth_to_dataset(
    dataset_name: str,
    ground_truth_df: DataFrame,
    dataset_metadata: Optional[Dict[str, Any]] = None,
)

Create a new dataset, if not existing, and add ground truth data to it. If the dataset with the same name already exists, the ground truth data will be added to it.

PARAMETER DESCRIPTION
dataset_name

Name of the dataset.

TYPE: str

ground_truth_df

DataFrame containing the ground truth data.

TYPE: DataFrame

dataset_metadata

Additional metadata to add to the dataset.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

get_ground_truth
get_ground_truth(dataset_name: str) -> DataFrame

Get ground truth data from the dataset. dataset_name: Name of the dataset.

start_evaluator
start_evaluator(
    restart: bool = False,
    fork: bool = False,
    disable_tqdm: bool = False,
    run_location: Optional[FeedbackRunLocation] = None,
    return_when_done: bool = False,
) -> Optional[Union[Process, Thread]]

Start a deferred feedback function evaluation thread or process.

PARAMETER DESCRIPTION
restart

If set, will stop the existing evaluator before starting a new one.

TYPE: bool DEFAULT: False

fork

If set, will start the evaluator in a new process instead of a thread. NOT CURRENTLY SUPPORTED.

TYPE: bool DEFAULT: False

disable_tqdm

If set, will disable progress bar logging from the evaluator.

TYPE: bool DEFAULT: False

run_location

Run only the evaluations corresponding to run_location.

TYPE: Optional[FeedbackRunLocation] DEFAULT: None

return_when_done

Instead of running asynchronously, will block until no feedbacks remain.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Optional[Union[Process, Thread]]

If return_when_done is True, then returns None. Otherwise, the started process or thread that is executing the deferred feedback evaluator.

Relevant constants

RETRY_RUNNING_SECONDS

RETRY_FAILED_SECONDS

DEFERRED_NUM_RUNS

MAX_THREADS

stop_evaluator
stop_evaluator()

Stop the deferred feedback evaluation thread.