trulens.core.session¶
trulens.core.session
¶
Classes¶
TruSession
¶
Bases: _WithExperimentalSettings
, PydanticSingleton
TruSession is the main class that provides an entry points to trulens.
TruSession lets you:
- Log app prompts and outputs
- Log app Metadata
- Run and log feedback functions
- Run streamlit dashboard to view experiment results
By default, all data is logged to the current working directory to
"default.sqlite"
. Data can be logged to a SQLAlchemy-compatible url
referred to by database_url
.
Supported App Types
TruChain: Langchain apps.
TruLlama: Llama Index apps.
TruRails: NeMo Guardrails apps.
TruBasicApp:
Basic apps defined solely using a function from str
to str
.
TruCustomApp: Custom apps containing custom structures and methods. Requires annotation of methods to instrument.
TruVirtual: Virtual apps that do not have a real app to instrument but have a virtual structure and can log existing captured data as if they were trulens records.
PARAMETER | DESCRIPTION |
---|---|
connector |
Database Connector to use. If not provided, a default DefaultDBConnector is created.
TYPE:
|
experimental_feature_flags |
Experimental feature flags.
TYPE:
|
**kwargs |
All other arguments are used to initialize
DefaultDBConnector.
Mutually exclusive with
DEFAULT:
|
Attributes¶
RETRY_RUNNING_SECONDS
class-attribute
instance-attribute
¶
RETRY_RUNNING_SECONDS: float = 60.0
How long to wait (in seconds) before restarting a feedback function that has already started
A feedback function execution that has started may have stalled or failed in a bad way that did not record the failure.
RETRY_FAILED_SECONDS
class-attribute
instance-attribute
¶
RETRY_FAILED_SECONDS: float = 5 * 60.0
How long to wait (in seconds) to retry a failed feedback function run.
DEFERRED_NUM_RUNS
class-attribute
instance-attribute
¶
DEFERRED_NUM_RUNS: int = 32
Number of futures to wait for when evaluating deferred feedback functions.
RECORDS_BATCH_TIMEOUT_IN_SEC
class-attribute
instance-attribute
¶
RECORDS_BATCH_TIMEOUT_IN_SEC: int = 10
Time to wait before inserting a batch of records into the database.
GROUND_TRUTHS_BATCH_SIZE
class-attribute
instance-attribute
¶
GROUND_TRUTHS_BATCH_SIZE: int = 100
Time to wait before inserting a batch of ground truths into the database.
connector
class-attribute
instance-attribute
¶
connector: Optional[DBConnector] = Field(None, exclude=True)
Database Connector to use. If not provided, a default is created and used.
experimental_otel_exporter
property
writable
¶
experimental_otel_exporter: Any
EXPERIMENTAL(otel_tracing): OpenTelemetry SpanExporter to send spans to.
Only works if the trulens.core.experimental.Feature.OTEL_TRACING flag is set. The setter will set and lock the flag as enabled.
Functions¶
experimental_enable_feature
¶
Enable the given feature flag.
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the flag is already frozen to disabled. |
experimental_disable_feature
¶
Disable the given feature flag.
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the flag is already frozen to enabled. |
experimental_feature
¶
Determine the value of the given feature flag.
If freeze
is set, the flag will be frozen to the value returned.
experimental_set_features
¶
experimental_set_features(
flags: Union[
Iterable[Union[str, Feature]],
Mapping[Union[str, Feature], bool],
],
freeze: bool = False,
)
Set multiple feature flags.
If freeze
is set, the flags will be frozen to the values given.
RAISES | DESCRIPTION |
---|---|
ValueError
|
If any flag is already frozen to a different value than |
App
¶
Create an App from the given App constructor arguments by guessing which app type they refer to.
This method intentionally prints out the type of app being created to let user know in case the guess is wrong.
Virtual
¶
Virtual(*args, **kwargs) -> App
Deprecated
Use trulens.core.session.TruSession.App instead.
find_unused_port
¶
find_unused_port(*args, **kwargs)
Deprecated
Use trulens.dashboard.run.find_unused_port instead.
run_dashboard
¶
run_dashboard(*args, **kwargs)
Deprecated
Use trulens.dashboard.run.run_dashboard instead.
start_dashboard
¶
start_dashboard(*args, **kwargs)
Deprecated
Use trulens.dashboard.run.run_dashboard instead.
stop_dashboard
¶
stop_dashboard(*args, **kwargs)
Deprecated
Use trulens.dashboard.run.stop_dashboard instead.
update_record
¶
update_record(*args, **kwargs)
Deprecated
Use trulens.core.session.TruSession.connector .db.insert_record instead.
migrate_database
¶
Migrates the database.
This should be run whenever there are breaking changes in a database created with an older version of trulens.
PARAMETER | DESCRIPTION |
---|---|
**kwargs |
Keyword arguments to pass to migrate_database of the current database. |
See DB.migrate_database.
add_record
¶
add_record_nowait
¶
add_record_nowait(record: Record) -> None
Add a record to the queue to be inserted in the next batch.
run_feedback_functions
¶
run_feedback_functions(
record: Record,
feedback_functions: Sequence[Feedback],
app: Optional[AppDefinition] = None,
wait: bool = True,
) -> Union[
Iterable[FeedbackResult],
Iterable[Future[FeedbackResult]],
]
Run a collection of feedback functions and report their result.
PARAMETER | DESCRIPTION |
---|---|
record |
The record on which to evaluate the feedback functions.
TYPE:
|
app |
The app that produced the given record.
If not provided, it is looked up from the given database
TYPE:
|
feedback_functions |
A collection of feedback functions to evaluate. |
wait |
If set (default), will wait for results before returning.
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
Union[Iterable[FeedbackResult], Iterable[Future[FeedbackResult]]]
|
One result for each element of |
add_app
¶
add_app(app: AppDefinition) -> AppID
Add an app to the database and return its unique id.
PARAMETER | DESCRIPTION |
---|---|
app |
The app to add to the database.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AppID
|
A unique app identifier str. |
delete_app
¶
delete_app(app_id: AppID) -> None
Deletes an app from the database based on its app_id.
PARAMETER | DESCRIPTION |
---|---|
app_id |
The unique identifier of the app to be deleted.
TYPE:
|
add_feedback
¶
add_feedback(
feedback_result_or_future: Optional[
Union[FeedbackResult, Future[FeedbackResult]]
] = None,
**kwargs: dict
) -> FeedbackResultID
Add a single feedback result or future to the database and return its unique id.
PARAMETER | DESCRIPTION |
---|---|
feedback_result_or_future |
If a Future
is given, call will wait for the result before adding it to the
database. If
TYPE:
|
**kwargs |
Fields to add to the given feedback result or to create a new FeedbackResult with.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
FeedbackResultID
|
A unique result identifier str. |
add_feedbacks
¶
add_feedbacks(
feedback_results: Iterable[
Union[FeedbackResult, Future[FeedbackResult]]
]
) -> List[FeedbackResultID]
Add multiple feedback results to the database and return their unique ids.
PARAMETER | DESCRIPTION |
---|---|
feedback_results |
An iterable with each iteration being a FeedbackResult or Future of the same. Each given future will be waited.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[FeedbackResultID]
|
List of unique result identifiers str in the same order as input
|
get_app
¶
get_app(app_id: AppID) -> Optional[JSONized[AppDefinition]]
Look up an app from the database.
This method produces the JSON-ized version of the app. It can be deserialized back into an AppDefinition with model_validate:
Example
from trulens.core.schema import app
app_json = session.get_app(app_id="app_hash_85ebbf172d02e733c8183ac035d0cbb2")
app = app.AppDefinition.model_validate(app_json)
Warning
Do not rely on deserializing into App as its implementations feature attributes not meant to be deserialized.
PARAMETER | DESCRIPTION |
---|---|
app_id |
The unique identifier str of the app to look up.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[JSONized[AppDefinition]]
|
JSON-ized version of the app. |
get_apps
¶
get_apps() -> List[JSONized[AppDefinition]]
Look up all apps from the database.
RETURNS | DESCRIPTION |
---|---|
List[JSONized[AppDefinition]]
|
A list of JSON-ized version of all apps in the database. |
Warning
Same Deserialization caveats as get_app.
get_records_and_feedback
¶
get_records_and_feedback(
app_ids: Optional[List[AppID]] = None,
offset: Optional[int] = None,
limit: Optional[int] = None,
) -> Tuple[DataFrame, List[str]]
Get records, their feedback results, and feedback names.
PARAMETER | DESCRIPTION |
---|---|
app_ids |
A list of app ids to filter records by. If empty or not given, all apps' records will be returned. |
offset |
Record row offset. |
limit |
Limit on the number of records to return. |
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
DataFrame of records with their feedback results. |
List[str]
|
List of feedback names that are columns in the DataFrame. |
get_leaderboard
¶
get_leaderboard(
app_ids: Optional[List[AppID]] = None,
group_by_metadata_key: Optional[str] = None,
limit: Optional[int] = None,
offset: Optional[int] = None,
) -> DataFrame
Get a leaderboard for the given apps.
PARAMETER | DESCRIPTION |
---|---|
app_ids |
A list of app ids to filter records by. If empty or not given, all apps will be included in leaderboard. |
group_by_metadata_key |
A key included in record metadata that you want to group results by. |
limit |
Limit on the number of records to aggregate to produce the leaderboard. |
offset |
Record row offset to select which records to use to aggregate the leaderboard. |
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
Dataframe of apps with their feedback results aggregated. |
DataFrame
|
If group_by_metadata_key is provided, the dataframe will be grouped by the specified key. |
add_ground_truth_to_dataset
¶
add_ground_truth_to_dataset(
dataset_name: str,
ground_truth_df: DataFrame,
dataset_metadata: Optional[Dict[str, Any]] = None,
)
Create a new dataset, if not existing, and add ground truth data to it. If the dataset with the same name already exists, the ground truth data will be added to it.
PARAMETER | DESCRIPTION |
---|---|
dataset_name |
Name of the dataset.
TYPE:
|
ground_truth_df |
DataFrame containing the ground truth data.
TYPE:
|
dataset_metadata |
Additional metadata to add to the dataset. |
get_ground_truth
¶
Get ground truth data from the dataset. dataset_name: Name of the dataset.
start_evaluator
¶
start_evaluator(
restart: bool = False,
fork: bool = False,
disable_tqdm: bool = False,
run_location: Optional[FeedbackRunLocation] = None,
return_when_done: bool = False,
) -> Optional[Union[Process, Thread]]
Start a deferred feedback function evaluation thread or process.
PARAMETER | DESCRIPTION |
---|---|
restart |
If set, will stop the existing evaluator before starting a new one.
TYPE:
|
fork |
If set, will start the evaluator in a new process instead of a thread. NOT CURRENTLY SUPPORTED.
TYPE:
|
disable_tqdm |
If set, will disable progress bar logging from the evaluator.
TYPE:
|
run_location |
Run only the evaluations corresponding to run_location.
TYPE:
|
return_when_done |
Instead of running asynchronously, will block until no feedbacks remain.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[Union[Process, Thread]]
|
If return_when_done is True, then returns None. Otherwise, the started process or thread that is executing the deferred feedback evaluator. |