Skip to content

trulens.core

trulens.core

Trulens Core LLM Evaluation Library

The trulens-core library includes everything to get started.

Classes

Feedback

Bases: FeedbackDefinition

Feedback function container.

Typical usage is to specify a feedback implementation function from a Provider and the mapping of selectors describing how to construct the arguments to the implementation:

Example
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
hugs = Huggingface()

# Create a feedback function from a provider:
feedback = Feedback(
    hugs.language_match # the implementation
).on_input_output() # selectors shorthand
Attributes
tru_class_info instance-attribute
tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

implementation class-attribute instance-attribute
implementation: Optional[Union[Function, Method]] = None

Implementation serialization.

aggregator class-attribute instance-attribute
aggregator: Optional[Union[Function, Method]] = None

Aggregator method serialization.

combinations class-attribute instance-attribute

Mode of combining selected values to produce arguments to each feedback function call.

feedback_definition_id instance-attribute
feedback_definition_id: FeedbackDefinitionID = (
    feedback_definition_id
)

Id, if not given, uniquely determined from content.

if_exists class-attribute instance-attribute
if_exists: Optional[Lens] = None

Only execute the feedback function if the following selector names something that exists in a record/app.

Can use this to evaluate conditionally on presence of some calls, for example. Feedbacks skipped this way will have a status of FeedbackResultStatus.SKIPPED.

if_missing class-attribute instance-attribute

How to handle missing parameters in feedback function calls.

run_location instance-attribute

Where the feedback evaluation takes place (e.g. locally, at a Snowflake server, etc).

selectors instance-attribute
selectors: Dict[str, Lens]

Selectors; pointers into Records of where to get arguments for imp.

supplied_name class-attribute instance-attribute
supplied_name: Optional[str] = None

An optional name. Only will affect displayed tables.

higher_is_better class-attribute instance-attribute
higher_is_better: Optional[bool] = None

Feedback result magnitude interpretation.

imp class-attribute instance-attribute

Implementation callable.

A serialized version is stored at FeedbackDefinition.implementation.

agg class-attribute instance-attribute

Aggregator method for feedback functions that produce more than one result.

A serialized version is stored at FeedbackDefinition.aggregator.

sig property
sig: Signature

Signature of the feedback function implementation.

name property
name: str

Name of the feedback function.

Derived from the name of the function implementing it if no supplied name provided.

Functions
__rich_repr__
__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load staticmethod
load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod
model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

on_input_output
on_input_output() -> Feedback

Specifies that the feedback implementation arguments are to be the main app input and output in that order.

Returns a new Feedback object with the specification.

on_default
on_default() -> Feedback

Specifies that one argument feedbacks should be evaluated on the main app output and two argument feedbacks should be evaluates on main input and main output in that order.

Returns a new Feedback object with this specification.

evaluate_deferred staticmethod
evaluate_deferred(
    session: TruSession,
    limit: Optional[int] = None,
    shuffle: bool = False,
    run_location: Optional[FeedbackRunLocation] = None,
) -> List[Tuple[Series, Future[FeedbackResult]]]

Evaluates feedback functions that were specified to be deferred.

Returns a list of tuples with the DB row containing the Feedback and initial FeedbackResult as well as the Future which will contain the actual result.

PARAMETER DESCRIPTION
limit

The maximum number of evals to start.

TYPE: Optional[int] DEFAULT: None

shuffle

Shuffle the order of the feedbacks to evaluate.

TYPE: bool DEFAULT: False

run_location

Only run feedback functions with this run_location.

TYPE: Optional[FeedbackRunLocation] DEFAULT: None

Constants that govern behavior:

  • TruSession.RETRY_RUNNING_SECONDS: How long to time before restarting a feedback that was started but never failed (or failed without recording that fact).

  • TruSession.RETRY_FAILED_SECONDS: How long to wait to retry a failed feedback.

aggregate
aggregate(
    func: Optional[AggCallable] = None,
    combinations: Optional[FeedbackCombinations] = None,
) -> Feedback

Specify the aggregation function in case the selectors for this feedback generate more than one value for implementation argument(s). Can also specify the method of producing combinations of values in such cases.

Returns a new Feedback object with the given aggregation function and/or the given combination mode.

on_prompt
on_prompt(arg: Optional[str] = None) -> Feedback

Create a variant of self that will take in the main app input or "prompt" as input, sending it as an argument arg to implementation.

on_response
on_response(arg: Optional[str] = None) -> Feedback

Create a variant of self that will take in the main app output or "response" as input, sending it as an argument arg to implementation.

on
on(*args, **kwargs) -> Feedback

Create a variant of self with the same implementation but the given selectors. Those provided positionally get their implementation argument name guessed and those provided as kwargs get their name from the kwargs key.

check_selectors
check_selectors(
    app: Union[AppDefinition, JSON],
    record: Record,
    source_data: Optional[Dict[str, Any]] = None,
    warning: bool = False,
) -> bool

Check that the selectors are valid for the given app and record.

PARAMETER DESCRIPTION
app

The app that produced the record.

TYPE: Union[AppDefinition, JSON]

record

The record that the feedback will run on. This can be a mostly empty record for checking ahead of producing one. The utility method App.dummy_record is built for this purpose.

TYPE: Record

source_data

Additional data to select from when extracting feedback function arguments.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

warning

Issue a warning instead of raising an error if a selector is invalid. As some parts of a Record cannot be known ahead of producing it, it may be necessary to not raise exception here and only issue a warning.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
bool

True if the selectors are valid. False if not (if warning is set).

RAISES DESCRIPTION
ValueError

If a selector is invalid and warning is not set.

run
run(
    app: Optional[Union[AppDefinition, JSON]] = None,
    record: Optional[Record] = None,
    source_data: Optional[Dict] = None,
    **kwargs: Dict[str, Any]
) -> FeedbackResult

Run the feedback function on the given record. The app that produced the record is also required to determine input/output argument names.

PARAMETER DESCRIPTION
app

The app that produced the record. This can be AppDefinition or a jsonized AppDefinition. It will be jsonized if it is not already.

TYPE: Optional[Union[AppDefinition, JSON]] DEFAULT: None

record

The record to evaluate the feedback on.

TYPE: Optional[Record] DEFAULT: None

source_data

Additional data to select from when extracting feedback function arguments.

TYPE: Optional[Dict] DEFAULT: None

**kwargs

Any additional keyword arguments are used to set or override selected feedback function inputs.

TYPE: Dict[str, Any] DEFAULT: {}

RETURNS DESCRIPTION
FeedbackResult

A FeedbackResult object with the result of the feedback function.

extract_selection
extract_selection(
    app: Optional[Union[AppDefinition, JSON]] = None,
    record: Optional[Record] = None,
    source_data: Optional[Dict] = None,
) -> Iterable[Dict[str, Any]]

Given the app that produced the given record, extract from record the values that will be sent as arguments to the implementation as specified by self.selectors. Additional data to select from can be provided in source_data. All args are optional. If a Record is specified, its calls are laid out as app (see layout_calls_as_app).

Provider

Bases: WithClassInfo, SerialModel

Base Provider class.

TruLens makes use of Feedback Providers to generate evaluations of large language model applications. These providers act as an access point to different models, most commonly classification models and large language models.

These models are then used to generate feedback on application outputs or intermediate results.

Provider is the base class for all feedback providers. It is an abstract class and should not be instantiated directly. Rather, it should be subclassed and the subclass should implement the methods defined in this class.

There are many feedback providers available in TruLens that grant access to a wide range of proprietary and open-source models.

Providers for classification and other non-LLM models should directly subclass Provider. The feedback functions available for these providers are tied to specific providers, as they rely on provider-specific endpoints to models that are tuned to a particular task.

For example, the Huggingface feedback provider provides access to a number of classification models for specific tasks, such as language detection. These models are than utilized by a feedback function to generate an evaluation score.

Example:

```python
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()
huggingface_provider.language_match(prompt, response)
```

Providers for LLM models should subclass trulens.feedback.LLMProvider, which itself subclasses Provider. Providers for LLM-generated feedback are more of a plug-and-play variety. This means that the base model of your choice can be combined with feedback-specific prompting to generate feedback.

For example, relevance can be run with any base LLM feedback provider. Once the feedback provider is instantiated with a base model, the relevance function can be called with a prompt and response.

This means that the base model selected is combined with specific prompting for relevance to generate feedback.

Example:

```python
from trulens.providers.openai import OpenAI
provider = OpenAI(model_engine="gpt-3.5-turbo")
provider.relevance(prompt, response)
```
Attributes
tru_class_info instance-attribute
tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

endpoint class-attribute instance-attribute
endpoint: Optional[Endpoint] = None

Endpoint supporting this provider.

Remote API invocations are handled by the endpoint.

Functions
__rich_repr__
__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load staticmethod
load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod
model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

SnowflakeFeedback

Bases: Feedback

Similar to the parent class Feedback except this ensures the feedback is run only on the Snowflake server.

Attributes
tru_class_info instance-attribute
tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

implementation class-attribute instance-attribute
implementation: Optional[Union[Function, Method]] = None

Implementation serialization.

aggregator class-attribute instance-attribute
aggregator: Optional[Union[Function, Method]] = None

Aggregator method serialization.

combinations class-attribute instance-attribute

Mode of combining selected values to produce arguments to each feedback function call.

feedback_definition_id instance-attribute
feedback_definition_id: FeedbackDefinitionID = (
    feedback_definition_id
)

Id, if not given, uniquely determined from content.

if_exists class-attribute instance-attribute
if_exists: Optional[Lens] = None

Only execute the feedback function if the following selector names something that exists in a record/app.

Can use this to evaluate conditionally on presence of some calls, for example. Feedbacks skipped this way will have a status of FeedbackResultStatus.SKIPPED.

if_missing class-attribute instance-attribute

How to handle missing parameters in feedback function calls.

selectors instance-attribute
selectors: Dict[str, Lens]

Selectors; pointers into Records of where to get arguments for imp.

supplied_name class-attribute instance-attribute
supplied_name: Optional[str] = None

An optional name. Only will affect displayed tables.

higher_is_better class-attribute instance-attribute
higher_is_better: Optional[bool] = None

Feedback result magnitude interpretation.

name property
name: str

Name of the feedback function.

Derived from the name of the function implementing it if no supplied name provided.

imp class-attribute instance-attribute

Implementation callable.

A serialized version is stored at FeedbackDefinition.implementation.

agg class-attribute instance-attribute

Aggregator method for feedback functions that produce more than one result.

A serialized version is stored at FeedbackDefinition.aggregator.

sig property
sig: Signature

Signature of the feedback function implementation.

Functions
__rich_repr__
__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load staticmethod
load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod
model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

on_input_output
on_input_output() -> Feedback

Specifies that the feedback implementation arguments are to be the main app input and output in that order.

Returns a new Feedback object with the specification.

on_default
on_default() -> Feedback

Specifies that one argument feedbacks should be evaluated on the main app output and two argument feedbacks should be evaluates on main input and main output in that order.

Returns a new Feedback object with this specification.

evaluate_deferred staticmethod
evaluate_deferred(
    session: TruSession,
    limit: Optional[int] = None,
    shuffle: bool = False,
    run_location: Optional[FeedbackRunLocation] = None,
) -> List[Tuple[Series, Future[FeedbackResult]]]

Evaluates feedback functions that were specified to be deferred.

Returns a list of tuples with the DB row containing the Feedback and initial FeedbackResult as well as the Future which will contain the actual result.

PARAMETER DESCRIPTION
limit

The maximum number of evals to start.

TYPE: Optional[int] DEFAULT: None

shuffle

Shuffle the order of the feedbacks to evaluate.

TYPE: bool DEFAULT: False

run_location

Only run feedback functions with this run_location.

TYPE: Optional[FeedbackRunLocation] DEFAULT: None

Constants that govern behavior:

  • TruSession.RETRY_RUNNING_SECONDS: How long to time before restarting a feedback that was started but never failed (or failed without recording that fact).

  • TruSession.RETRY_FAILED_SECONDS: How long to wait to retry a failed feedback.

aggregate
aggregate(
    func: Optional[AggCallable] = None,
    combinations: Optional[FeedbackCombinations] = None,
) -> Feedback

Specify the aggregation function in case the selectors for this feedback generate more than one value for implementation argument(s). Can also specify the method of producing combinations of values in such cases.

Returns a new Feedback object with the given aggregation function and/or the given combination mode.

on_prompt
on_prompt(arg: Optional[str] = None) -> Feedback

Create a variant of self that will take in the main app input or "prompt" as input, sending it as an argument arg to implementation.

on_response
on_response(arg: Optional[str] = None) -> Feedback

Create a variant of self that will take in the main app output or "response" as input, sending it as an argument arg to implementation.

on
on(*args, **kwargs) -> Feedback

Create a variant of self with the same implementation but the given selectors. Those provided positionally get their implementation argument name guessed and those provided as kwargs get their name from the kwargs key.

check_selectors
check_selectors(
    app: Union[AppDefinition, JSON],
    record: Record,
    source_data: Optional[Dict[str, Any]] = None,
    warning: bool = False,
) -> bool

Check that the selectors are valid for the given app and record.

PARAMETER DESCRIPTION
app

The app that produced the record.

TYPE: Union[AppDefinition, JSON]

record

The record that the feedback will run on. This can be a mostly empty record for checking ahead of producing one. The utility method App.dummy_record is built for this purpose.

TYPE: Record

source_data

Additional data to select from when extracting feedback function arguments.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

warning

Issue a warning instead of raising an error if a selector is invalid. As some parts of a Record cannot be known ahead of producing it, it may be necessary to not raise exception here and only issue a warning.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
bool

True if the selectors are valid. False if not (if warning is set).

RAISES DESCRIPTION
ValueError

If a selector is invalid and warning is not set.

run
run(
    app: Optional[Union[AppDefinition, JSON]] = None,
    record: Optional[Record] = None,
    source_data: Optional[Dict] = None,
    **kwargs: Dict[str, Any]
) -> FeedbackResult

Run the feedback function on the given record. The app that produced the record is also required to determine input/output argument names.

PARAMETER DESCRIPTION
app

The app that produced the record. This can be AppDefinition or a jsonized AppDefinition. It will be jsonized if it is not already.

TYPE: Optional[Union[AppDefinition, JSON]] DEFAULT: None

record

The record to evaluate the feedback on.

TYPE: Optional[Record] DEFAULT: None

source_data

Additional data to select from when extracting feedback function arguments.

TYPE: Optional[Dict] DEFAULT: None

**kwargs

Any additional keyword arguments are used to set or override selected feedback function inputs.

TYPE: Dict[str, Any] DEFAULT: {}

RETURNS DESCRIPTION
FeedbackResult

A FeedbackResult object with the result of the feedback function.

extract_selection
extract_selection(
    app: Optional[Union[AppDefinition, JSON]] = None,
    record: Optional[Record] = None,
    source_data: Optional[Dict] = None,
) -> Iterable[Dict[str, Any]]

Given the app that produced the given record, extract from record the values that will be sent as arguments to the implementation as specified by self.selectors. Additional data to select from can be provided in source_data. All args are optional. If a Record is specified, its calls are laid out as app (see layout_calls_as_app).

FeedbackMode

Bases: str, Enum

Mode of feedback evaluation.

Specify this using the feedback_mode to App constructors.

Note

This class extends str to allow users to compare its values with their string representations, i.e. in if mode == "none": .... Internal uses should use the enum instances.

Attributes
NONE class-attribute instance-attribute
NONE = 'none'

No evaluation will happen even if feedback functions are specified.

WITH_APP class-attribute instance-attribute
WITH_APP = 'with_app'

Try to run feedback functions immediately and before app returns a record.

WITH_APP_THREAD class-attribute instance-attribute
WITH_APP_THREAD = 'with_app_thread'

Try to run feedback functions in the same process as the app but after it produces a record.

DEFERRED class-attribute instance-attribute
DEFERRED = 'deferred'

Evaluate later via the process started by TruSession.start_deferred_feedback_evaluator.

Select

Utilities for creating selectors using Lens and aliases/shortcuts.

Attributes
Query class-attribute instance-attribute
Query = Lens

Selector type.

Tru class-attribute instance-attribute
Tru: Lens = Query()

Selector for the tru wrapper (TruLlama, TruChain, etc.).

Record class-attribute instance-attribute
Record: Query = __record__

Selector for the record.

App class-attribute instance-attribute
App: Query = __app__

Selector for the app.

RecordInput class-attribute instance-attribute
RecordInput: Query = main_input

Selector for the main app input.

RecordOutput class-attribute instance-attribute
RecordOutput: Query = main_output

Selector for the main app output.

RecordCalls class-attribute instance-attribute
RecordCalls: Query = app

Selector for the calls made by the wrapped app.

Laid out by path into components.

RecordCall class-attribute instance-attribute
RecordCall: Query = calls[-1]

Selector for the first called method (last to return).

RecordArgs class-attribute instance-attribute
RecordArgs: Query = args

Selector for the whole set of inputs/arguments to the first called / last method call.

RecordRets class-attribute instance-attribute
RecordRets: Query = rets

Selector for the whole output of the first called / last returned method call.

Functions
path_and_method staticmethod
path_and_method(select: Query) -> Tuple[Query, str]

If select names in method as the last attribute, extract the method name and the selector without the final method name.

dequalify staticmethod
dequalify(select: Query) -> Query

If the given selector qualifies record or app, remove that qualification.

render_for_dashboard staticmethod
render_for_dashboard(query: Query) -> str

Render the given query for use in dashboard to help user specify feedback functions.

TruSession

Bases: BaseModel, SingletonPerName

TruSession is the main class that provides an entry points to trulens.

TruSession lets you:

  • Log app prompts and outputs
  • Log app Metadata
  • Run and log feedback functions
  • Run streamlit dashboard to view experiment results

By default, all data is logged to the current working directory to "default.sqlite". Data can be logged to a SQLAlchemy-compatible url referred to by database_url.

Supported App Types

TruChain: Langchain apps.

TruLlama: Llama Index apps.

TruRails: NeMo Guardrails apps.

TruBasicApp: Basic apps defined solely using a function from str to str.

TruCustomApp: Custom apps containing custom structures and methods. Requires annotation of methods to instrument.

TruVirtual: Virtual apps that do not have a real app to instrument but have a virtual structure and can log existing captured data as if they were trulens records.

PARAMETER DESCRIPTION
connector

Database Connector to use. If not provided, a default DefaultDBConnector is created.

TYPE: Optional[DBConnector] DEFAULT: None

**kwargs

All other arguments are used to initialize DefaultDBConnector. Mutually exclusive with connector.

DEFAULT: {}

Attributes
RETRY_RUNNING_SECONDS class-attribute instance-attribute
RETRY_RUNNING_SECONDS: float = 60.0

How long to wait (in seconds) before restarting a feedback function that has already started

A feedback function execution that has started may have stalled or failed in a bad way that did not record the failure.

See also

start_evaluator

DEFERRED

RETRY_FAILED_SECONDS class-attribute instance-attribute
RETRY_FAILED_SECONDS: float = 5 * 60.0

How long to wait (in seconds) to retry a failed feedback function run.

DEFERRED_NUM_RUNS class-attribute instance-attribute
DEFERRED_NUM_RUNS: int = 32

Number of futures to wait for when evaluating deferred feedback functions.

RECORDS_BATCH_TIMEOUT_IN_SEC class-attribute instance-attribute
RECORDS_BATCH_TIMEOUT_IN_SEC: int = 10

Time to wait before inserting a batch of records into the database.

GROUND_TRUTHS_BATCH_SIZE class-attribute instance-attribute
GROUND_TRUTHS_BATCH_SIZE: int = 100

Time to wait before inserting a batch of ground truths into the database.

connector class-attribute instance-attribute
connector: Optional[DBConnector] = Field(None, exclude=True)

Database Connector to use. If not provided, a default is created and used.

Functions
warning
warning()

Issue warning that this singleton already exists.

delete_singleton_by_name staticmethod
delete_singleton_by_name(
    name: str, cls: Optional[Type[SingletonPerName]] = None
)

Delete the singleton instance with the given name.

This can be used for testing to create another singleton.

PARAMETER DESCRIPTION
name

The name of the singleton instance to delete.

TYPE: str

cls

The class of the singleton instance to delete. If not given, all instances with the given name are deleted.

TYPE: Optional[Type[SingletonPerName]] DEFAULT: None

delete_singleton
delete_singleton()

Delete the singleton instance. Can be used for testing to create another singleton.

App
App(*args, app: Optional[Any] = None, **kwargs) -> App

Create an App from the given App constructor arguments by guessing which app type they refer to.

This method intentionally prints out the type of app being created to let user know in case the guess is wrong.

Basic
Basic(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Custom
Custom(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Virtual
Virtual(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Chain
Chain(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Llama
Llama(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

Rails
Rails(*args, **kwargs) -> App

Deprecated

Use trulens.core.session.TruSession.App instead.

find_unused_port
find_unused_port(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.find_unused_port instead.

run_dashboard
run_dashboard(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.run_dashboard instead.

start_dashboard
start_dashboard(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.run_dashboard instead.

stop_dashboard
stop_dashboard(*args, **kwargs)

Deprecated

Use trulens.dashboard.run.stop_dashboard instead.

update_record
update_record(*args, **kwargs)
reset_database
reset_database()

Reset the database. Clears all tables.

See DB.reset_database.

migrate_database
migrate_database(**kwargs: Dict[str, Any])

Migrates the database.

This should be run whenever there are breaking changes in a database created with an older version of trulens.

PARAMETER DESCRIPTION
**kwargs

Keyword arguments to pass to migrate_database of the current database.

TYPE: Dict[str, Any] DEFAULT: {}

See DB.migrate_database.

add_record
add_record(
    record: Optional[Record] = None, **kwargs: dict
) -> RecordID

Add a record to the database.

PARAMETER DESCRIPTION
record

The record to add.

TYPE: Optional[Record] DEFAULT: None

**kwargs

Record fields to add to the given record or a new record if no record provided.

TYPE: dict DEFAULT: {}

RETURNS DESCRIPTION
RecordID

Unique record identifier str .

add_record_nowait
add_record_nowait(record: Record) -> None

Add a record to the queue to be inserted in the next batch.

run_feedback_functions
run_feedback_functions(
    record: Record,
    feedback_functions: Sequence[Feedback],
    app: Optional[AppDefinition] = None,
    wait: bool = True,
) -> Union[
    Iterable[FeedbackResult],
    Iterable[Future[FeedbackResult]],
]

Run a collection of feedback functions and report their result.

PARAMETER DESCRIPTION
record

The record on which to evaluate the feedback functions.

TYPE: Record

app

The app that produced the given record. If not provided, it is looked up from the given database db.

TYPE: Optional[AppDefinition] DEFAULT: None

feedback_functions

A collection of feedback functions to evaluate.

TYPE: Sequence[Feedback]

wait

If set (default), will wait for results before returning.

TYPE: bool DEFAULT: True

YIELDS DESCRIPTION
Union[Iterable[FeedbackResult], Iterable[Future[FeedbackResult]]]

One result for each element of feedback_functions of FeedbackResult if wait is enabled (default) or Future of FeedbackResult if wait is disabled.

add_app
add_app(app: AppDefinition) -> AppID

Add an app to the database and return its unique id.

PARAMETER DESCRIPTION
app

The app to add to the database.

TYPE: AppDefinition

RETURNS DESCRIPTION
AppID

A unique app identifier str.

delete_app
delete_app(app_id: AppID) -> None

Deletes an app from the database based on its app_id.

PARAMETER DESCRIPTION
app_id

The unique identifier of the app to be deleted.

TYPE: AppID

add_feedback
add_feedback(
    feedback_result_or_future: Optional[
        Union[FeedbackResult, Future[FeedbackResult]]
    ] = None,
    **kwargs: dict
) -> FeedbackResultID

Add a single feedback result or future to the database and return its unique id.

PARAMETER DESCRIPTION
feedback_result_or_future

If a Future is given, call will wait for the result before adding it to the database. If kwargs are given and a FeedbackResult is also given, the kwargs will be used to update the FeedbackResult otherwise a new one will be created with kwargs as arguments to its constructor.

TYPE: Optional[Union[FeedbackResult, Future[FeedbackResult]]] DEFAULT: None

**kwargs

Fields to add to the given feedback result or to create a new FeedbackResult with.

TYPE: dict DEFAULT: {}

RETURNS DESCRIPTION
FeedbackResultID

A unique result identifier str.

add_feedbacks
add_feedbacks(
    feedback_results: Iterable[
        Union[FeedbackResult, Future[FeedbackResult]]
    ]
) -> List[FeedbackResultID]

Add multiple feedback results to the database and return their unique ids.

PARAMETER DESCRIPTION
feedback_results

An iterable with each iteration being a FeedbackResult or Future of the same. Each given future will be waited.

TYPE: Iterable[Union[FeedbackResult, Future[FeedbackResult]]]

RETURNS DESCRIPTION
List[FeedbackResultID]

List of unique result identifiers str in the same order as input feedback_results.

get_app
get_app(app_id: AppID) -> Optional[JSONized[AppDefinition]]

Look up an app from the database.

This method produces the JSON-ized version of the app. It can be deserialized back into an AppDefinition with model_validate:

Example
from trulens.core.schema import app
app_json = session.get_app(app_id="app_hash_85ebbf172d02e733c8183ac035d0cbb2")
app = app.AppDefinition.model_validate(app_json)
Warning

Do not rely on deserializing into App as its implementations feature attributes not meant to be deserialized.

PARAMETER DESCRIPTION
app_id

The unique identifier str of the app to look up.

TYPE: AppID

RETURNS DESCRIPTION
Optional[JSONized[AppDefinition]]

JSON-ized version of the app.

get_apps
get_apps() -> List[JSONized[AppDefinition]]

Look up all apps from the database.

RETURNS DESCRIPTION
List[JSONized[AppDefinition]]

A list of JSON-ized version of all apps in the database.

Warning

Same Deserialization caveats as get_app.

get_records_and_feedback
get_records_and_feedback(
    app_ids: Optional[List[AppID]] = None,
    offset: Optional[int] = None,
    limit: Optional[int] = None,
) -> Tuple[DataFrame, List[str]]

Get records, their feedback results, and feedback names.

PARAMETER DESCRIPTION
app_ids

A list of app ids to filter records by. If empty or not given, all apps' records will be returned.

TYPE: Optional[List[AppID]] DEFAULT: None

offset

Record row offset.

TYPE: Optional[int] DEFAULT: None

limit

Limit on the number of records to return.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
DataFrame

DataFrame of records with their feedback results.

List[str]

List of feedback names that are columns in the DataFrame.

get_leaderboard
get_leaderboard(
    app_ids: Optional[List[AppID]] = None,
    group_by_metadata_key: Optional[str] = None,
) -> DataFrame

Get a leaderboard for the given apps.

PARAMETER DESCRIPTION
app_ids

A list of app ids to filter records by. If empty or not given, all apps will be included in leaderboard.

TYPE: Optional[List[AppID]] DEFAULT: None

group_by_metadata_key

A key included in record metadata that you want to group results by.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
DataFrame

Dataframe of apps with their feedback results aggregated.

DataFrame

If group_by_metadata_key is provided, the dataframe will be grouped by the specified key.

add_ground_truth_to_dataset
add_ground_truth_to_dataset(
    dataset_name: str,
    ground_truth_df: DataFrame,
    dataset_metadata: Optional[Dict[str, Any]] = None,
)

Create a new dataset, if not existing, and add ground truth data to it. If the dataset with the same name already exists, the ground truth data will be added to it.

PARAMETER DESCRIPTION
dataset_name

Name of the dataset.

TYPE: str

ground_truth_df

DataFrame containing the ground truth data.

TYPE: DataFrame

dataset_metadata

Additional metadata to add to the dataset.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

get_ground_truth
get_ground_truth(dataset_name: str) -> DataFrame

Get ground truth data from the dataset. dataset_name: Name of the dataset.

start_evaluator
start_evaluator(
    restart: bool = False,
    fork: bool = False,
    disable_tqdm: bool = False,
    run_location: Optional[FeedbackRunLocation] = None,
    return_when_done: bool = False,
) -> Optional[Union[Process, Thread]]

Start a deferred feedback function evaluation thread or process.

PARAMETER DESCRIPTION
restart

If set, will stop the existing evaluator before starting a new one.

TYPE: bool DEFAULT: False

fork

If set, will start the evaluator in a new process instead of a thread. NOT CURRENTLY SUPPORTED.

TYPE: bool DEFAULT: False

disable_tqdm

If set, will disable progress bar logging from the evaluator.

TYPE: bool DEFAULT: False

run_location

Run only the evaluations corresponding to run_location.

TYPE: Optional[FeedbackRunLocation] DEFAULT: None

return_when_done

Instead of running asynchronously, will block until no feedbacks remain.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Optional[Union[Process, Thread]]

If return_when_done is True, then returns None. Otherwise, the started process or thread that is executing the deferred feedback evaluator.

Relevant constants

RETRY_RUNNING_SECONDS

RETRY_FAILED_SECONDS

DEFERRED_NUM_RUNS

MAX_THREADS

stop_evaluator
stop_evaluator()

Stop the deferred feedback evaluation thread.

Functions