Skip to content

Selecting Components

LLM applications come in all shapes and sizes and with a variety of different control flows. As a result it’s a challenge to consistently evaluate parts of an LLM application trace.

Therefore, we’ve adapted the use of lenses to refer to parts of an LLM stack trace and use those when defining evaluations. For example, the following lens refers to the input to the retrieve step of the app called query.

Example

Select.RecordCalls.retrieve.args.query

Such lenses can then be used to define evaluations as so:

Example

# Context relevance between question and each context chunk.
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name = "Context Relevance")
    .on(Select.RecordCalls.retrieve.args.query)
    .on(Select.RecordCalls.retrieve.rets)
    .aggregate(np.mean)
)

In most cases, the Select object produces only a single item but can also address multiple items.

For example: Select.RecordCalls.retrieve.args.query refers to only one item.

However, Select.RecordCalls.retrieve.rets refers to multiple items. In this case, the documents returned by the retrieve method. These items can be evaluated separately, as shown above, or can be collected into an array for evaluation with .collect(). This is most commonly used for groundedness evaluations.

Example

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name = "Groundedness")
    .on(Select.RecordCalls.retrieve.rets.collect())
    .on_output()
)

Selectors can also access multiple calls to the same component. In agentic applications, this is an increasingly common practice. For example, an agent could complete multiple calls to a retrieve method to complete the task required.

For example, the following method returns only the returned context documents from the first invocation of retrieve.

Example

context = Select.RecordCalls.retrieve.rets.rets[:]

Alternatively, adding [:] after the method name retrieve returns context documents from all invocations of retrieve.

Example

context_all_calls = Select.RecordCalls.retrieve[:].rets.rets[:]

See also other Select shortcuts.

Understanding the structure of your app

Because LLM apps have a wide variation in their structure, the feedback selector construction can also vary widely. To construct the feedback selector, you must first understand the structure of your application.

In python, you can access the JSON structure with with_record methods and then calling layout_calls_as_app.

Example

response = my_llm_app(query)

from trulens.apps.langchain import TruChain
tru_recorder = TruChain(
    my_llm_app,
    app_name='ChatApplication',
    app_version="Chain1",
)

response, tru_record = tru_recorder.with_record(my_llm_app, query)
json_like = tru_record.layout_calls_as_app()

If a selector looks like the below:

Example

Select.Record.app.combine_documents_chain._call

It can be accessed via the JSON-like via:

Example

json_like['app']['combine_documents_chain']['_call']

The application structure can also be viewed in the TruLens user interface. You can view this structure on the Evaluations page by scrolling down to the Timeline.

The top level record also contains these helper accessors

  • RecordInput = Record.main_input -- points to the main input part of a Record. This is the first argument to the root method of an app (for LangChain Chains this is the __call__ method).

  • RecordOutput = Record.main_output -- points to the main output part of a Record. This is the output of the root method of an app (i.e. __call__ for LangChain Chains).

  • RecordCalls = Record.app -- points to the root of the app-structured mirror of calls in a record. See App-organized Calls Section above.

Multiple Inputs Per Argument

As in the f_context_relevance example, a selector for a single argument may point to more than one aspect of a record/app. These are specified using the slice or lists in key/index positions. In that case, the feedback function is evaluated multiple times, its outputs collected, and finally aggregated into a main feedback result.

The collection of values for each argument of feedback implementation is collected and every combination of argument-to-value mapping is evaluated with a feedback definition. This may produce a large number of evaluations if more than one argument names multiple values. In the dashboard, all individual invocations of a feedback implementation are shown alongside the final aggregate result.

App/Record Organization (What can be selected)

The top level JSON attributes are defined by the class structures.

For a Record:

trulens.core.schema.Record

Bases: SerialModel, Hashable

The record of a single main method call.

Note

This class will be renamed to Trace in the future.

Attributes

record_id instance-attribute

record_id: RecordID = record_id

Unique identifier for this record.

app_id instance-attribute

app_id: AppID

The app that produced this record.

cost class-attribute instance-attribute

cost: Optional[Cost] = None

Costs associated with the record.

perf class-attribute instance-attribute

perf: Optional[Perf] = None

Performance information.

ts class-attribute instance-attribute

ts: datetime = Field(default_factory=now)

Timestamp of last update.

This is usually set whenever a record is changed in any way.

tags class-attribute instance-attribute

tags: Optional[str] = ''

Tags for the record.

meta class-attribute instance-attribute

meta: Optional[JSON] = None

Metadata for the record.

main_input class-attribute instance-attribute

main_input: Optional[JSON] = None

The app's main input.

main_output class-attribute instance-attribute

main_output: Optional[JSON] = None

The app's main output if there was no error.

main_error class-attribute instance-attribute

main_error: Optional[JSON] = None

The app's main error if there was an error.

calls class-attribute instance-attribute

calls: List[RecordAppCall] = []

The collection of calls recorded.

Note that these can be converted into a json structure with the same paths as the app that generated this record via layout_calls_as_app.

Invariant: calls are ordered by .perf.end_time.

experimental_otel_spans class-attribute instance-attribute

experimental_otel_spans: List[Any] = []

EXPERIMENTAL(otel-tracing): OTEL spans representation of this record.

This will be filled in only if the otel-tracing experimental feature is enabled.

feedback_and_future_results class-attribute instance-attribute

feedback_and_future_results: Optional[
    List[Tuple[FeedbackDefinition, Future[FeedbackResult]]]
] = Field(None, exclude=True)

Map of feedbacks to the futures for of their results.

These are only filled for records that were just produced. This will not be filled in when read from database. Also, will not fill in when using FeedbackMode.DEFERRED.

feedback_results class-attribute instance-attribute

feedback_results: Optional[List[Future[FeedbackResult]]] = (
    Field(None, exclude=True)
)

Only the futures part of the above for backwards compatibility.

feedback_results_as_completed property

feedback_results_as_completed: Iterable[FeedbackResult]

Generate feedback results as they are completed.

Wraps feedback_results in as_completed.

Functions

__rich_repr__

__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

wait_for_feedback_results

wait_for_feedback_results(
    feedback_timeout: Optional[float] = None,
) -> Dict[FeedbackDefinition, FeedbackResult]

Wait for feedback results to finish.

PARAMETER DESCRIPTION
feedback_timeout

Timeout in seconds for each feedback function. If not given, will use the default timeout trulens.core.utils.threading.TP.DEBUG_TIMEOUT.

TYPE: Optional[float] DEFAULT: None

RETURNS DESCRIPTION
Dict[FeedbackDefinition, FeedbackResult]

A mapping of feedback functions to their results.

get

get(path: Lens) -> Optional[T]

Get a value from the record using a path.

PARAMETER DESCRIPTION
path

Path to the value.

TYPE: Lens

layout_calls_as_app

layout_calls_as_app() -> Munch

Layout the calls in this record into the structure that follows that of the app that created this record.

This uses the paths stored in each RecordAppCall which are paths into the app.

Note: We cannot create a validated AppDefinition class (or subclass) object here as the layout of records differ in these ways:

  • Records do not include anything that is not an instrumented method hence have most of the structure of a app missing.

  • Records have RecordAppCall as their leafs where method definitions would be in the AppDefinition structure.

For an App:

trulens.core.schema.AppDefinition

Bases: WithClassInfo, SerialModel

Serialized fields of an app here whereas App contains non-serialized fields.

Attributes

tru_class_info instance-attribute

tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

app_id class-attribute instance-attribute

app_id: AppID = Field(frozen=True)

Unique identifier for this app.

Computed deterministically from app_name and app_version. Leaving it here for it to be dumped when serializing. Also making it read-only as it should not be changed after creation.

app_name instance-attribute

app_name: AppName

Name for this app. Default is "default_app".

app_version instance-attribute

app_version: AppVersion

Version tag for this app. Default is "base".

tags instance-attribute

tags: Tags = tags

Tags for the app.

metadata instance-attribute

metadata: Metadata

Metadata for the app.

feedback_definitions class-attribute instance-attribute

feedback_definitions: Sequence[FeedbackDefinitionID] = []

Feedback functions to evaluate on each record.

feedback_mode class-attribute instance-attribute

feedback_mode: FeedbackMode = WITH_APP_THREAD

How to evaluate feedback functions upon producing a record.

record_ingest_mode instance-attribute

record_ingest_mode: RecordIngestMode = record_ingest_mode

Mode of records ingestion.

root_class instance-attribute

root_class: Class

Class of the main instrumented object.

Ideally this would be a ClassVar but since we want to check this without instantiating the subclass of AppDefinition that would define it, we cannot use ClassVar.

root_callable class-attribute

root_callable: FunctionOrMethod

App's main method.

This is to be filled in by subclass.

app instance-attribute

Wrapped app in jsonized form.

initial_app_loader_dump class-attribute instance-attribute

initial_app_loader_dump: Optional[SerialBytes] = None

Serialization of a function that loads an app.

Dump is of the initial app state before any invocations. This can be used to create a new session.

Warning

Experimental work in progress.

app_extra_json instance-attribute

app_extra_json: JSON

Info to store about the app and to display in dashboard.

This can be used even if app itself cannot be serialized. app_extra_json, then, can stand in place for whatever data the user might want to keep track of about the app.

Functions

__rich_repr__

__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load staticmethod

load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod

model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

continue_session staticmethod

continue_session(
    app_definition_json: JSON, app: Any
) -> AppDefinition

Instantiate the given app with the given state app_definition_json.

Warning

This is an experimental feature with ongoing work.

PARAMETER DESCRIPTION
app_definition_json

The json serialized app.

TYPE: JSON

app

The app to continue the session with.

TYPE: Any

RETURNS DESCRIPTION
AppDefinition

A new AppDefinition instance with the given app and the given app_definition_json state.

new_session staticmethod

new_session(
    app_definition_json: JSON,
    initial_app_loader: Optional[Callable] = None,
) -> AppDefinition

Create an app instance at the start of a session.

Warning

This is an experimental feature with ongoing work.

Create a copy of the json serialized app with the enclosed app being initialized to its initial state before any records are produced (i.e. blank memory).

get_loadable_apps staticmethod

get_loadable_apps()

Gets a list of all of the loadable apps.

Warning

This is an experimental feature with ongoing work.

This is those that have initial_app_loader_dump set.

select_inputs classmethod

select_inputs() -> Lens

Get the path to the main app's call inputs.

select_outputs classmethod

select_outputs() -> Lens

Get the path to the main app's call outputs.

For your app, you can inspect the JSON-like structure by using the dict method:

Example

json_like = ... # your app, extending App
print(json_like.dict())

Calls made by App Components

When evaluating a feedback function, Records are augmented with app/component calls. For example, if the instrumented app contains a component combine_docs_chain then app.combine_docs_chain will contain calls to methods of this component. app.combine_docs_chain._call will contain a RecordAppCall (see schema.py) with information about the inputs/outputs/metadata regarding the _call call to that component. Selecting this information is the reason behind the Select.RecordCalls alias.

You can inspect the components making up your app via the App method print_instrumented.