Skip to content

trulens.core.feedback

trulens.core.feedback

Classes

Endpoint

Bases: WithClassInfo, SerialModel, InstanceRefMixin

API usage, pacing, and utilities for API endpoints.

Attributes
tru_class_info instance-attribute
tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

instrumented_methods class-attribute
instrumented_methods: Dict[
    Any, List[Tuple[Callable, Callable, Type[Endpoint]]]
] = defaultdict(list)

Mapping of classes/module-methods that have been instrumented for cost tracking along with the wrapper methods and the class that instrumented them.

Key is the class or module owning the instrumented method. Tuple value has:

  • original function,

  • wrapped version,

  • endpoint that did the wrapping.

name instance-attribute
name: str

API/endpoint name.

rpm class-attribute instance-attribute

Requests per minute.

retries class-attribute instance-attribute
retries: int = 3

Retries (if performing requests using this class).

post_headers class-attribute instance-attribute
post_headers: Dict[str, str] = Field(
    default_factory=dict, exclude=True
)

Optional post headers for post requests if done by this class.

pace class-attribute instance-attribute
pace: Pace = Field(
    default_factory=lambda: Pace(
        marks_per_second=DEFAULT_RPM / 60.0,
        seconds_per_period=60.0,
    ),
    exclude=True,
)

Pacing instance to maintain a desired rpm.

global_callback class-attribute instance-attribute
global_callback: EndpointCallback = Field(exclude=True)

Track costs not run inside "track_cost" here.

Also note that Endpoints are singletons (one for each unique name argument) hence this global callback will track all requests for the named api even if you try to create multiple endpoints (with the same name).

callback_class class-attribute instance-attribute
callback_class: Type[EndpointCallback] = Field(exclude=True)

Callback class to use for usage tracking.

callback_name class-attribute instance-attribute
callback_name: str = Field(exclude=True)

Name of variable that stores the callback noted above.

Classes
EndpointSetup dataclass

Class for storing supported endpoint information.

See track_all_costs for usage.

Functions
get_instances classmethod
get_instances() -> Generator[InstanceRefMixin]

Get all instances of the class.

delete_instances classmethod
delete_instances()

Delete all instances of the class.

__rich_repr__
__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load staticmethod
load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod
model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

pace_me
pace_me() -> float

Block until we can make a request to this endpoint to keep pace with maximum rpm. Returns time in seconds since last call to this method returned.

run_in_pace
run_in_pace(
    func: Callable[[A], B], *args, **kwargs
) -> B

Run the given func on the given args and kwargs at pace with the endpoint-specified rpm. Failures will be retried self.retries times.

run_me
run_me(thunk: Thunk[T]) -> T

DEPRECATED: Run the given thunk, returning itse output, on pace with the api. Retries request multiple times if self.retries > 0.

DEPRECATED: Use run_in_pace instead.

print_instrumented classmethod
print_instrumented()

Print out all of the methods that have been instrumented for cost tracking. This is organized by the classes/modules containing them.

track_all_costs staticmethod
track_all_costs(
    __func: CallableMaybeAwaitable[A, T],
    *args,
    with_openai: bool = True,
    with_hugs: bool = True,
    with_litellm: bool = True,
    with_bedrock: bool = True,
    with_cortex: bool = True,
    with_dummy: bool = True,
    **kwargs
) -> Tuple[T, Sequence[EndpointCallback]]

Track costs of all of the apis we can currently track, over the execution of thunk.

track_all_costs_tally staticmethod
track_all_costs_tally(
    __func: CallableMaybeAwaitable[A, T],
    *args,
    with_openai: bool = True,
    with_hugs: bool = True,
    with_litellm: bool = True,
    with_bedrock: bool = True,
    with_cortex: bool = True,
    with_dummy: bool = True,
    **kwargs
) -> Tuple[T, Thunk[Cost]]

Track costs of all of the apis we can currently track, over the execution of thunk.

RETURNS DESCRIPTION
T

Result of evaluating the thunk.

TYPE: T

Thunk[Cost]

Thunk[Cost]: A thunk that returns the total cost of all callbacks that tracked costs. This is a thunk as the costs might change after this method returns in case of Awaitable results.

track_cost
track_cost(
    __func: CallableMaybeAwaitable[..., T], *args, **kwargs
) -> Tuple[T, EndpointCallback]

Tally only the usage performed within the execution of the given thunk.

Returns the thunk's result alongside the EndpointCallback object that includes the usage information.

handle_wrapped_call
handle_wrapped_call(
    func: Callable,
    bindings: BoundArguments,
    response: Any,
    callback: Optional[EndpointCallback],
) -> Any

This gets called with the results of every instrumented method.

This should be implemented by each subclass. Importantly, it must return the response or some wrapping of the response.

PARAMETER DESCRIPTION
func

the wrapped method.

TYPE: Callable

bindings

the inputs to the wrapped method.

TYPE: BoundArguments

response

whatever the wrapped function returned.

TYPE: Any

callback

the callback set up by track_cost if the wrapped method was called and returned within an invocation of track_cost.

TYPE: Optional[EndpointCallback]

wrap_function
wrap_function(func)

Create a wrapper of the given function to perform cost tracking.

EndpointCallback

Bases: SerialModel

Callbacks to be invoked after various API requests and track various metrics like token usage.

Attributes
endpoint class-attribute instance-attribute
endpoint: Endpoint = Field(exclude=True)

The endpoint owning this callback.

cost class-attribute instance-attribute
cost: Cost = Field(default_factory=Cost)

Costs tracked by this callback.

Functions
__rich_repr__
__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

handle
handle(response: Any) -> None

Called after each request.

handle_chunk
handle_chunk(response: Any) -> None

Called after receiving a chunk from a request.

handle_generation
handle_generation(response: Any) -> None

Called after each completion request.

handle_generation_chunk
handle_generation_chunk(response: Any) -> None

Called after receiving a chunk from a completion request.

handle_classification
handle_classification(response: Any) -> None

Called after each classification response.

handle_embedding
handle_embedding(response: Any) -> None

Called after each embedding response.

Feedback

Bases: FeedbackDefinition

Feedback function container.

Typical usage is to specify a feedback implementation function from a Provider and the mapping of selectors describing how to construct the arguments to the implementation:

Example
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
hugs = Huggingface()

# Create a feedback function from a provider:
feedback = Feedback(
    hugs.language_match # the implementation
).on_input_output() # selectors shorthand
Attributes
tru_class_info instance-attribute
tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

implementation class-attribute instance-attribute
implementation: Optional[Union[Function, Method]] = None

Implementation serialization.

aggregator class-attribute instance-attribute
aggregator: Optional[Union[Function, Method]] = None

Aggregator method serialization.

combinations class-attribute instance-attribute

Mode of combining selected values to produce arguments to each feedback function call.

feedback_definition_id instance-attribute
feedback_definition_id: FeedbackDefinitionID = (
    feedback_definition_id
)

Id, if not given, uniquely determined from content.

if_exists class-attribute instance-attribute
if_exists: Optional[Lens] = None

Only execute the feedback function if the following selector names something that exists in a record/app.

Can use this to evaluate conditionally on presence of some calls, for example. Feedbacks skipped this way will have a status of FeedbackResultStatus.SKIPPED.

if_missing class-attribute instance-attribute

How to handle missing parameters in feedback function calls.

run_location instance-attribute

Where the feedback evaluation takes place (e.g. locally, at a Snowflake server, etc).

selectors instance-attribute
selectors: Dict[str, Lens]

Selectors; pointers into Records of where to get arguments for imp.

supplied_name class-attribute instance-attribute
supplied_name: Optional[str] = None

An optional name. Only will affect displayed tables.

higher_is_better class-attribute instance-attribute
higher_is_better: Optional[bool] = None

Feedback result magnitude interpretation.

imp class-attribute instance-attribute

Implementation callable.

A serialized version is stored at FeedbackDefinition.implementation.

agg class-attribute instance-attribute

Aggregator method for feedback functions that produce more than one result.

A serialized version is stored at FeedbackDefinition.aggregator.

sig property
sig: Signature

Signature of the feedback function implementation.

name property
name: str

Name of the feedback function.

Derived from the name of the function implementing it if no supplied name provided.

Functions
__rich_repr__
__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load staticmethod
load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod
model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

on_input_output
on_input_output() -> Feedback

Specifies that the feedback implementation arguments are to be the main app input and output in that order.

Returns a new Feedback object with the specification.

on_default
on_default() -> Feedback

Specifies that one argument feedbacks should be evaluated on the main app output and two argument feedbacks should be evaluates on main input and main output in that order.

Returns a new Feedback object with this specification.

evaluate_deferred staticmethod
evaluate_deferred(
    session: TruSession,
    limit: Optional[int] = None,
    shuffle: bool = False,
    run_location: Optional[FeedbackRunLocation] = None,
) -> List[Tuple[Series, Future[FeedbackResult]]]

Evaluates feedback functions that were specified to be deferred.

Returns a list of tuples with the DB row containing the Feedback and initial FeedbackResult as well as the Future which will contain the actual result.

PARAMETER DESCRIPTION
limit

The maximum number of evals to start.

TYPE: Optional[int] DEFAULT: None

shuffle

Shuffle the order of the feedbacks to evaluate.

TYPE: bool DEFAULT: False

run_location

Only run feedback functions with this run_location.

TYPE: Optional[FeedbackRunLocation] DEFAULT: None

Constants that govern behavior:

  • TruSession.RETRY_RUNNING_SECONDS: How long to time before restarting a feedback that was started but never failed (or failed without recording that fact).

  • TruSession.RETRY_FAILED_SECONDS: How long to wait to retry a failed feedback.

aggregate
aggregate(
    func: Optional[AggCallable] = None,
    combinations: Optional[FeedbackCombinations] = None,
) -> Feedback

Specify the aggregation function in case the selectors for this feedback generate more than one value for implementation argument(s). Can also specify the method of producing combinations of values in such cases.

Returns a new Feedback object with the given aggregation function and/or the given combination mode.

on_prompt
on_prompt(arg: Optional[str] = None) -> Feedback

Create a variant of self that will take in the main app input or "prompt" as input, sending it as an argument arg to implementation.

on_response
on_response(arg: Optional[str] = None) -> Feedback

Create a variant of self that will take in the main app output or "response" as input, sending it as an argument arg to implementation.

on
on(*args, **kwargs) -> Feedback

Create a variant of self with the same implementation but the given selectors. Those provided positionally get their implementation argument name guessed and those provided as kwargs get their name from the kwargs key.

check_selectors
check_selectors(
    app: Union[AppDefinition, JSON],
    record: Record,
    source_data: Optional[Dict[str, Any]] = None,
    warning: bool = False,
) -> bool

Check that the selectors are valid for the given app and record.

PARAMETER DESCRIPTION
app

The app that produced the record.

TYPE: Union[AppDefinition, JSON]

record

The record that the feedback will run on. This can be a mostly empty record for checking ahead of producing one. The utility method App.dummy_record is built for this purpose.

TYPE: Record

source_data

Additional data to select from when extracting feedback function arguments.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

warning

Issue a warning instead of raising an error if a selector is invalid. As some parts of a Record cannot be known ahead of producing it, it may be necessary to not raise exception here and only issue a warning.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
bool

True if the selectors are valid. False if not (if warning is set).

RAISES DESCRIPTION
ValueError

If a selector is invalid and warning is not set.

run
run(
    app: Optional[Union[AppDefinition, JSON]] = None,
    record: Optional[Record] = None,
    source_data: Optional[Dict] = None,
    **kwargs: Dict[str, Any]
) -> FeedbackResult

Run the feedback function on the given record. The app that produced the record is also required to determine input/output argument names.

PARAMETER DESCRIPTION
app

The app that produced the record. This can be AppDefinition or a jsonized AppDefinition. It will be jsonized if it is not already.

TYPE: Optional[Union[AppDefinition, JSON]] DEFAULT: None

record

The record to evaluate the feedback on.

TYPE: Optional[Record] DEFAULT: None

source_data

Additional data to select from when extracting feedback function arguments.

TYPE: Optional[Dict] DEFAULT: None

**kwargs

Any additional keyword arguments are used to set or override selected feedback function inputs.

TYPE: Dict[str, Any] DEFAULT: {}

RETURNS DESCRIPTION
FeedbackResult

A FeedbackResult object with the result of the feedback function.

extract_selection
extract_selection(
    app: Optional[Union[AppDefinition, JSON]] = None,
    record: Optional[Record] = None,
    source_data: Optional[Dict] = None,
) -> Iterable[Dict[str, Any]]

Given the app that produced the given record, extract from record the values that will be sent as arguments to the implementation as specified by self.selectors. Additional data to select from can be provided in source_data. All args are optional. If a Record is specified, its calls are laid out as app (see layout_calls_as_app).

SkipEval

Bases: Exception

Raised when evaluating a feedback function implementation to skip it so it is not aggregated with other non-skipped results.

PARAMETER DESCRIPTION
reason

Optional reason for why this evaluation was skipped.

TYPE: Optional[str] DEFAULT: None

feedback

The Feedback instance this run corresponds to.

TYPE: Optional[Feedback] DEFAULT: None

ins

The arguments to this run.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

SnowflakeFeedback

Bases: Feedback

Similar to the parent class Feedback except this ensures the feedback is run only on the Snowflake server.

Attributes
tru_class_info instance-attribute
tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

implementation class-attribute instance-attribute
implementation: Optional[Union[Function, Method]] = None

Implementation serialization.

aggregator class-attribute instance-attribute
aggregator: Optional[Union[Function, Method]] = None

Aggregator method serialization.

combinations class-attribute instance-attribute

Mode of combining selected values to produce arguments to each feedback function call.

feedback_definition_id instance-attribute
feedback_definition_id: FeedbackDefinitionID = (
    feedback_definition_id
)

Id, if not given, uniquely determined from content.

if_exists class-attribute instance-attribute
if_exists: Optional[Lens] = None

Only execute the feedback function if the following selector names something that exists in a record/app.

Can use this to evaluate conditionally on presence of some calls, for example. Feedbacks skipped this way will have a status of FeedbackResultStatus.SKIPPED.

if_missing class-attribute instance-attribute

How to handle missing parameters in feedback function calls.

selectors instance-attribute
selectors: Dict[str, Lens]

Selectors; pointers into Records of where to get arguments for imp.

supplied_name class-attribute instance-attribute
supplied_name: Optional[str] = None

An optional name. Only will affect displayed tables.

higher_is_better class-attribute instance-attribute
higher_is_better: Optional[bool] = None

Feedback result magnitude interpretation.

name property
name: str

Name of the feedback function.

Derived from the name of the function implementing it if no supplied name provided.

imp class-attribute instance-attribute

Implementation callable.

A serialized version is stored at FeedbackDefinition.implementation.

agg class-attribute instance-attribute

Aggregator method for feedback functions that produce more than one result.

A serialized version is stored at FeedbackDefinition.aggregator.

sig property
sig: Signature

Signature of the feedback function implementation.

Functions
__rich_repr__
__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load staticmethod
load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod
model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

on_input_output
on_input_output() -> Feedback

Specifies that the feedback implementation arguments are to be the main app input and output in that order.

Returns a new Feedback object with the specification.

on_default
on_default() -> Feedback

Specifies that one argument feedbacks should be evaluated on the main app output and two argument feedbacks should be evaluates on main input and main output in that order.

Returns a new Feedback object with this specification.

evaluate_deferred staticmethod
evaluate_deferred(
    session: TruSession,
    limit: Optional[int] = None,
    shuffle: bool = False,
    run_location: Optional[FeedbackRunLocation] = None,
) -> List[Tuple[Series, Future[FeedbackResult]]]

Evaluates feedback functions that were specified to be deferred.

Returns a list of tuples with the DB row containing the Feedback and initial FeedbackResult as well as the Future which will contain the actual result.

PARAMETER DESCRIPTION
limit

The maximum number of evals to start.

TYPE: Optional[int] DEFAULT: None

shuffle

Shuffle the order of the feedbacks to evaluate.

TYPE: bool DEFAULT: False

run_location

Only run feedback functions with this run_location.

TYPE: Optional[FeedbackRunLocation] DEFAULT: None

Constants that govern behavior:

  • TruSession.RETRY_RUNNING_SECONDS: How long to time before restarting a feedback that was started but never failed (or failed without recording that fact).

  • TruSession.RETRY_FAILED_SECONDS: How long to wait to retry a failed feedback.

aggregate
aggregate(
    func: Optional[AggCallable] = None,
    combinations: Optional[FeedbackCombinations] = None,
) -> Feedback

Specify the aggregation function in case the selectors for this feedback generate more than one value for implementation argument(s). Can also specify the method of producing combinations of values in such cases.

Returns a new Feedback object with the given aggregation function and/or the given combination mode.

on_prompt
on_prompt(arg: Optional[str] = None) -> Feedback

Create a variant of self that will take in the main app input or "prompt" as input, sending it as an argument arg to implementation.

on_response
on_response(arg: Optional[str] = None) -> Feedback

Create a variant of self that will take in the main app output or "response" as input, sending it as an argument arg to implementation.

on
on(*args, **kwargs) -> Feedback

Create a variant of self with the same implementation but the given selectors. Those provided positionally get their implementation argument name guessed and those provided as kwargs get their name from the kwargs key.

check_selectors
check_selectors(
    app: Union[AppDefinition, JSON],
    record: Record,
    source_data: Optional[Dict[str, Any]] = None,
    warning: bool = False,
) -> bool

Check that the selectors are valid for the given app and record.

PARAMETER DESCRIPTION
app

The app that produced the record.

TYPE: Union[AppDefinition, JSON]

record

The record that the feedback will run on. This can be a mostly empty record for checking ahead of producing one. The utility method App.dummy_record is built for this purpose.

TYPE: Record

source_data

Additional data to select from when extracting feedback function arguments.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

warning

Issue a warning instead of raising an error if a selector is invalid. As some parts of a Record cannot be known ahead of producing it, it may be necessary to not raise exception here and only issue a warning.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
bool

True if the selectors are valid. False if not (if warning is set).

RAISES DESCRIPTION
ValueError

If a selector is invalid and warning is not set.

run
run(
    app: Optional[Union[AppDefinition, JSON]] = None,
    record: Optional[Record] = None,
    source_data: Optional[Dict] = None,
    **kwargs: Dict[str, Any]
) -> FeedbackResult

Run the feedback function on the given record. The app that produced the record is also required to determine input/output argument names.

PARAMETER DESCRIPTION
app

The app that produced the record. This can be AppDefinition or a jsonized AppDefinition. It will be jsonized if it is not already.

TYPE: Optional[Union[AppDefinition, JSON]] DEFAULT: None

record

The record to evaluate the feedback on.

TYPE: Optional[Record] DEFAULT: None

source_data

Additional data to select from when extracting feedback function arguments.

TYPE: Optional[Dict] DEFAULT: None

**kwargs

Any additional keyword arguments are used to set or override selected feedback function inputs.

TYPE: Dict[str, Any] DEFAULT: {}

RETURNS DESCRIPTION
FeedbackResult

A FeedbackResult object with the result of the feedback function.

extract_selection
extract_selection(
    app: Optional[Union[AppDefinition, JSON]] = None,
    record: Optional[Record] = None,
    source_data: Optional[Dict] = None,
) -> Iterable[Dict[str, Any]]

Given the app that produced the given record, extract from record the values that will be sent as arguments to the implementation as specified by self.selectors. Additional data to select from can be provided in source_data. All args are optional. If a Record is specified, its calls are laid out as app (see layout_calls_as_app).

Provider

Bases: WithClassInfo, SerialModel

Base Provider class.

TruLens makes use of Feedback Providers to generate evaluations of large language model applications. These providers act as an access point to different models, most commonly classification models and large language models.

These models are then used to generate feedback on application outputs or intermediate results.

Provider is the base class for all feedback providers. It is an abstract class and should not be instantiated directly. Rather, it should be subclassed and the subclass should implement the methods defined in this class.

There are many feedback providers available in TruLens that grant access to a wide range of proprietary and open-source models.

Providers for classification and other non-LLM models should directly subclass Provider. The feedback functions available for these providers are tied to specific providers, as they rely on provider-specific endpoints to models that are tuned to a particular task.

For example, the Huggingface feedback provider provides access to a number of classification models for specific tasks, such as language detection. These models are than utilized by a feedback function to generate an evaluation score.

Example
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()
huggingface_provider.language_match(prompt, response)

Providers for LLM models should subclass trulens.feedback.llm_provider.LLMProvider, which itself subclasses Provider. Providers for LLM-generated feedback are more of a plug-and-play variety. This means that the base model of your choice can be combined with feedback-specific prompting to generate feedback.

For example, relevance can be run with any base LLM feedback provider. Once the feedback provider is instantiated with a base model, the relevance function can be called with a prompt and response.

This means that the base model selected is combined with specific prompting for relevance to generate feedback.

Example
from trulens.providers.openai import OpenAI
provider = OpenAI(model_engine="gpt-3.5-turbo")
provider.relevance(prompt, response)
Attributes
tru_class_info instance-attribute
tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

endpoint class-attribute instance-attribute
endpoint: Optional[Endpoint] = None

Endpoint supporting this provider.

Remote API invocations are handled by the endpoint.

Functions
__rich_repr__
__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load staticmethod
load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod
model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.