Selecting Components¶

LLM applications come in all shapes and sizes and with a variety of different control flows. As a result it’s a challenge to consistently evaluate parts of an LLM application trace.

Therefore, we’ve adapted the use of lenses to refer to parts of an LLM stack trace and use those when defining evaluations. For example, the following lens refers to the input to the retrieve step of the app called query.

Example

Select.RecordCalls.retrieve.args.query

Such lenses can then be used to define evaluations as so:

Example

# Context relevance between question and each context chunk.
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name = "Context Relevance")
    .on(Select.RecordCalls.retrieve.args.query)
    .on(Select.RecordCalls.retrieve.rets)
    .aggregate(np.mean)
)

In most cases, the Select object produces only a single item but can also address multiple items.

For example: Select.RecordCalls.retrieve.args.query refers to only one item.

However, Select.RecordCalls.retrieve.rets refers to multiple items. In this case, the documents returned by the retrieve method. These items can be evaluated separately, as shown above, or can be collected into an array for evaluation with .collect(). This is most commonly used for groundedness evaluations.

Example

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name = "Groundedness")
    .on(Select.RecordCalls.retrieve.rets.collect())
    .on_output()
)

Selectors can also access multiple calls to the same component. In agentic applications, this is an increasingly common practice. For example, an agent could complete multiple calls to a retrieve method to complete the task required.

For example, the following method returns only the returned context documents from the first invocation of retrieve.

Example

context = Select.RecordCalls.retrieve.rets.rets[:]

Alternatively, adding [:] after the method name retrieve returns context documents from all invocations of retrieve.

Example

context_all_calls = Select.RecordCalls.retrieve[:].rets.rets[:]

Understanding the structure of your app¶

Because LLM apps have a wide variation in their structure, the feedback selector construction can also vary widely. To construct the feedback selector, you must first understand the structure of your application.

In python, you can access the JSON structure with with_record methods and then calling layout_calls_as_app.

Example

response = my_llm_app(query)

from trulens.apps.langchain import TruChain
tru_recorder = TruChain(
    my_llm_app,
    app_name='ChatApplication',
    app_version="Chain1",
)

response, tru_record = tru_recorder.with_record(my_llm_app, query)
json_like = tru_record.layout_calls_as_app()

If a selector looks like the below:

Example

Select.Record.app.combine_documents_chain._call

It can be accessed via the JSON-like via:

Example

json_like['app']['combine_documents_chain']['_call']

The application structure can also be viewed in the TruLens user interface. You can view this structure on the Evaluations page by scrolling down to the Timeline.

The top level record also contains these helper accessors

RecordInput = Record.main_input -- points to the main input part of a Record. This is the first argument to the root method of an app (for LangChain Chains this is the __call__ method).
RecordOutput = Record.main_output -- points to the main output part of a Record. This is the output of the root method of an app (i.e. __call__ for LangChain Chains).
RecordCalls = Record.app -- points to the root of the app-structured mirror of calls in a record. See App-organized Calls Section above.

Multiple Inputs Per Argument¶

As in the f_context_relevance example, a selector for a single argument may point to more than one aspect of a record/app. These are specified using the slice or lists in key/index positions. In that case, the feedback function is evaluated multiple times, its outputs collected, and finally aggregated into a main feedback result.

The collection of values for each argument of feedback implementation is collected and every combination of argument-to-value mapping is evaluated with a feedback definition. This may produce a large number of evaluations if more than one argument names multiple values. In the dashboard, all individual invocations of a feedback implementation are shown alongside the final aggregate result.

App/Record Organization (What can be selected)¶

The top level JSON attributes are defined by the class structures.

For a Record:

trulens.core.schema.Record ¶

Bases: SerialModel, Hashable

The record of a single main method call.

Note

This class will be renamed to Trace in the future.

Attributes¶

record_id `instance-attribute` ¶

record_id: RecordID = record_id

Unique identifier for this record.

app_id `instance-attribute` ¶

app_id: AppID

The app that produced this record.

cost `class-attribute` `instance-attribute` ¶

cost: Optional[Cost] = None

Costs associated with the record.

perf `class-attribute` `instance-attribute` ¶

perf: Optional[Perf] = None

Performance information.

ts `class-attribute` `instance-attribute` ¶

ts: datetime = Field(default_factory=now)

Timestamp of last update.

This is usually set whenever a record is changed in any way.

tags `class-attribute` `instance-attribute` ¶

tags: Optional[str] = ''

Tags for the record.

meta `class-attribute` `instance-attribute` ¶

meta: Optional[JSON] = None

Metadata for the record.

main_input `class-attribute` `instance-attribute` ¶

main_input: Optional[JSON] = None

The app's main input.

main_output `class-attribute` `instance-attribute` ¶

main_output: Optional[JSON] = None

The app's main output if there was no error.

main_error `class-attribute` `instance-attribute` ¶

main_error: Optional[JSON] = None

The app's main error if there was an error.

calls `class-attribute` `instance-attribute` ¶

calls: List[RecordAppCall] = []

The collection of calls recorded.

Note that these can be converted into a json structure with the same paths as the app that generated this record via layout_calls_as_app.

Invariant: calls are ordered by .perf.end_time.

feedback_and_future_results `class-attribute` `instance-attribute` ¶

feedback_and_future_results: Optional[
    List[Tuple[FeedbackDefinition, Future[FeedbackResult]]]
] = Field(None, exclude=True)

Map of feedbacks to the futures for of their results.

These are only filled for records that were just produced. This will not be filled in when read from database. Also, will not fill in when using FeedbackMode.DEFERRED.

feedback_results `class-attribute` `instance-attribute` ¶

feedback_results: Optional[List[Future[FeedbackResult]]] = (
    Field(None, exclude=True)
)

Only the futures part of the above for backwards compatibility.

feedback_results_as_completed `property` ¶

feedback_results_as_completed: Iterable[FeedbackResult]

Generate feedback results as they are completed.

Wraps feedback_results in as_completed.

Functions¶

__rich_repr__ ¶

__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

wait_for_feedback_results ¶

wait_for_feedback_results(
    feedback_timeout: Optional[float] = None,
) -> Dict[FeedbackDefinition, FeedbackResult]

Wait for feedback results to finish.

PARAMETER	DESCRIPTION
`feedback_timeout`	Timeout in seconds for each feedback function. If not given, will use the default timeout `trulens.core.utils.threading.TP.DEBUG_TIMEOUT`. TYPE: `Optional[float]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Dict[FeedbackDefinition, FeedbackResult]`	A mapping of feedback functions to their results.

get ¶

get(path: Lens) -> Optional[T]

Get a value from the record using a path.

PARAMETER	DESCRIPTION
`path`	Path to the value. TYPE: `Lens`

layout_calls_as_app ¶

layout_calls_as_app() -> Munch

Layout the calls in this record into the structure that follows that of the app that created this record.

This uses the paths stored in each RecordAppCall which are paths into the app.

Note: We cannot create a validated AppDefinition class (or subclass) object here as the layout of records differ in these ways:

Records do not include anything that is not an instrumented method hence have most of the structure of a app missing.
Records have RecordAppCall as their leafs where method definitions would be in the AppDefinition structure.

For an App:

trulens.core.schema.AppDefinition ¶

Bases: WithClassInfo, SerialModel

Serialized fields of an app here whereas App contains non-serialized fields.

Attributes¶

tru_class_info `instance-attribute` ¶

tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

app_id `class-attribute` `instance-attribute` ¶

app_id: AppID = Field(frozen=True)

Unique identifier for this app.

Computed deterministically from app_name and app_version. Leaving it here for it to be dumped when serializing. Also making it read-only as it should not be changed after creation.

app_name `instance-attribute` ¶

app_name: AppName

Name for this app. Default is "default_app".

app_version `instance-attribute` ¶

app_version: AppVersion

Version tag for this app. Default is "base".

tags `instance-attribute` ¶

tags: Tags = tags

Tags for the app.

metadata `instance-attribute` ¶

metadata: Metadata

Metadata for the app.

feedback_definitions `class-attribute` `instance-attribute` ¶

feedback_definitions: Sequence[FeedbackDefinitionID] = []

Feedback functions to evaluate on each record.

feedback_mode `class-attribute` `instance-attribute` ¶

feedback_mode: FeedbackMode = WITH_APP_THREAD

How to evaluate feedback functions upon producing a record.

record_ingest_mode `instance-attribute` ¶

record_ingest_mode: RecordIngestMode = record_ingest_mode

Mode of records ingestion.

root_class `instance-attribute` ¶

root_class: Class

Class of the main instrumented object.

Ideally this would be a ClassVar but since we want to check this without instantiating the subclass of AppDefinition that would define it, we cannot use ClassVar.

root_callable `class-attribute` ¶

root_callable: FunctionOrMethod

App's main method.

This is to be filled in by subclass.

app `instance-attribute` ¶

app: JSONized[AppDefinition]

Wrapped app in jsonized form.

initial_app_loader_dump `class-attribute` `instance-attribute` ¶

initial_app_loader_dump: Optional[SerialBytes] = None

Serialization of a function that loads an app.

Dump is of the initial app state before any invocations. This can be used to create a new session.

Warning

Experimental work in progress.

app_extra_json `instance-attribute` ¶

app_extra_json: JSON

Info to store about the app and to display in dashboard.

This can be used even if app itself cannot be serialized. app_extra_json, then, can stand in place for whatever data the user might want to keep track of about the app.

Functions¶

__rich_repr__ ¶

__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load `staticmethod` ¶

load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate `classmethod` ¶

model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

continue_session `staticmethod` ¶

continue_session(
    app_definition_json: JSON, app: Any
) -> AppDefinition

Instantiate the given app with the given state app_definition_json.

Warning

This is an experimental feature with ongoing work.

PARAMETER	DESCRIPTION
`app_definition_json`	The json serialized app. TYPE: `JSON`
`app`	The app to continue the session with. TYPE: `Any`

RETURNS	DESCRIPTION
`AppDefinition`	A new `AppDefinition` instance with the given `app` and the given `app_definition_json` state.

new_session `staticmethod` ¶

new_session(
    app_definition_json: JSON,
    initial_app_loader: Optional[Callable] = None,
) -> AppDefinition

Create an app instance at the start of a session.

Warning

This is an experimental feature with ongoing work.

Create a copy of the json serialized app with the enclosed app being initialized to its initial state before any records are produced (i.e. blank memory).

_submit_feedback_functions `staticmethod` ¶

_submit_feedback_functions(
    record: Record,
    feedback_functions: Sequence[Feedback],
    connector: DBConnector,
    app: Optional[AppDefinition] = None,
    on_done: Optional[
        Callable[
            [Union[FeedbackResult, Future[FeedbackResult]]],
            None,
        ]
    ] = None,
) -> List[Tuple[Feedback, Future[FeedbackResult]]]

Schedules to run the given feedback functions.

PARAMETER	DESCRIPTION
`record`	The record on which to evaluate the feedback functions. TYPE: `Record`
`feedback_functions`	A collection of feedback functions to evaluate. TYPE: `Sequence[Feedback]`
`connector`	The database connector to use. TYPE: `DBConnector`
`app`	The app that produced the given record. If not provided, it is looked up from the database of this `TruSession` instance TYPE: `Optional[AppDefinition]` DEFAULT: `None`
`on_done`	A callback to call when each feedback function is done. TYPE: `Optional[Callable[[Union[FeedbackResult, Future[FeedbackResult]]], None]]` DEFAULT: `None`

Returns:

List[Tuple[Feedback, Future[FeedbackResult]]]

Produces a list of tuples where the first item in each tuple is the
feedback function and the second is the future of the feedback result.

get_loadable_apps `staticmethod` ¶

get_loadable_apps()

Gets a list of all of the loadable apps.

Warning

This is an experimental feature with ongoing work.

This is those that have initial_app_loader_dump set.

select_inputs `classmethod` ¶

select_inputs() -> Lens

Get the path to the main app's call inputs.

select_outputs `classmethod` ¶

select_outputs() -> Lens

Get the path to the main app's call outputs.

For your app, you can inspect the JSON-like structure by using the dict method:

Example

json_like = ... # your app, extending App
print(json_like.dict())

Calls made by App Components¶

When evaluating a feedback function, Records are augmented with app/component calls. For example, if the instrumented app contains a component combine_docs_chain then app.combine_docs_chain will contain calls to methods of this component. app.combine_docs_chain._call will contain a RecordAppCall (see schema.py) with information about the inputs/outputs/metadata regarding the _call call to that component. Selecting this information is the reason behind the Select.RecordCalls alias.

You can inspect the components making up your app via the App method print_instrumented.

Selecting Components¶

Understanding the structure of your app¶

Multiple Inputs Per Argument¶

App/Record Organization (What can be selected)¶

trulens.core.schema.Record ¶

Attributes¶

record_id instance-attribute ¶

app_id instance-attribute ¶

cost class-attribute instance-attribute ¶

perf class-attribute instance-attribute ¶

ts class-attribute instance-attribute ¶

tags class-attribute instance-attribute ¶

meta class-attribute instance-attribute ¶

main_input class-attribute instance-attribute ¶

main_output class-attribute instance-attribute ¶

main_error class-attribute instance-attribute ¶

calls class-attribute instance-attribute ¶

feedback_and_future_results class-attribute instance-attribute ¶

feedback_results class-attribute instance-attribute ¶

feedback_results_as_completed property ¶

Functions¶

__rich_repr__ ¶

wait_for_feedback_results ¶

get ¶

layout_calls_as_app ¶

trulens.core.schema.AppDefinition ¶

Attributes¶

tru_class_info instance-attribute ¶

app_id class-attribute instance-attribute ¶

app_name instance-attribute ¶

app_version instance-attribute ¶

tags instance-attribute ¶

metadata instance-attribute ¶

feedback_definitions class-attribute instance-attribute ¶

feedback_mode class-attribute instance-attribute ¶

record_ingest_mode instance-attribute ¶

root_class instance-attribute ¶

root_callable class-attribute ¶

app instance-attribute ¶

initial_app_loader_dump class-attribute instance-attribute ¶

app_extra_json instance-attribute ¶

Functions¶

__rich_repr__ ¶

load staticmethod ¶

model_validate classmethod ¶

continue_session staticmethod ¶

new_session staticmethod ¶

_submit_feedback_functions staticmethod ¶

get_loadable_apps staticmethod ¶

select_inputs classmethod ¶

select_outputs classmethod ¶

Calls made by App Components¶

record_id `instance-attribute` ¶

app_id `instance-attribute` ¶

cost `class-attribute` `instance-attribute` ¶

perf `class-attribute` `instance-attribute` ¶

ts `class-attribute` `instance-attribute` ¶

tags `class-attribute` `instance-attribute` ¶

meta `class-attribute` `instance-attribute` ¶

main_input `class-attribute` `instance-attribute` ¶

main_output `class-attribute` `instance-attribute` ¶

main_error `class-attribute` `instance-attribute` ¶

calls `class-attribute` `instance-attribute` ¶

feedback_and_future_results `class-attribute` `instance-attribute` ¶

feedback_results `class-attribute` `instance-attribute` ¶

feedback_results_as_completed `property` ¶

tru_class_info `instance-attribute` ¶

app_id `class-attribute` `instance-attribute` ¶

app_name `instance-attribute` ¶

app_version `instance-attribute` ¶

tags `instance-attribute` ¶

metadata `instance-attribute` ¶

feedback_definitions `class-attribute` `instance-attribute` ¶

feedback_mode `class-attribute` `instance-attribute` ¶

record_ingest_mode `instance-attribute` ¶

root_class `instance-attribute` ¶

root_callable `class-attribute` ¶

app `instance-attribute` ¶

initial_app_loader_dump `class-attribute` `instance-attribute` ¶

app_extra_json `instance-attribute` ¶

load `staticmethod` ¶

model_validate `classmethod` ¶

continue_session `staticmethod` ¶

new_session `staticmethod` ¶

_submit_feedback_functions `staticmethod` ¶

get_loadable_apps `staticmethod` ¶

select_inputs `classmethod` ¶

select_outputs `classmethod` ¶