Selecting Components¶
LLM applications come in all shapes and sizes and with a variety of different control flows. As a result itβs a challenge to consistently evaluate parts of an LLM application trace.
Therefore, weβve adapted the use of lenses to refer to parts of an LLM stack trace and use those when defining evaluations. For example, the following lens refers to the input to the retrieve step of the app called query.
Example
Select.RecordCalls.retrieve.args.query
Such lenses can then be used to define evaluations as so:
Example
# Context relevance between question and each context chunk.
f_context_relevance = (
Feedback(provider.context_relevance_with_cot_reasons, name = "Context Relevance")
.on(Select.RecordCalls.retrieve.args.query)
.on(Select.RecordCalls.retrieve.rets)
.aggregate(np.mean)
)
In most cases, the Select object produces only a single item but can also address multiple items.
For example: Select.RecordCalls.retrieve.args.query
refers to only one item.
However, Select.RecordCalls.retrieve.rets
refers to multiple items. In this case,
the documents returned by the retrieve
method. These items can be evaluated separately,
as shown above, or can be collected into an array for evaluation with .collect()
.
This is most commonly used for groundedness evaluations.
Example
f_groundedness = (
Feedback(provider.groundedness_measure_with_cot_reasons, name = "Groundedness")
.on(Select.RecordCalls.retrieve.rets.collect())
.on_output()
)
Selectors can also access multiple calls to the same component. In agentic applications,
this is an increasingly common practice. For example, an agent could complete multiple
calls to a retrieve
method to complete the task required.
For example, the following method returns only the returned context documents from
the first invocation of retrieve
.
Example
context = Select.RecordCalls.retrieve.rets.rets[:]
Alternatively, adding [:]
after the method name retrieve
returns context documents
from all invocations of retrieve
.
Example
context_all_calls = Select.RecordCalls.retrieve[:].rets.rets[:]
See also other Select shortcuts.
Understanding the structure of your app¶
Because LLM apps have a wide variation in their structure, the feedback selector construction can also vary widely. To construct the feedback selector, you must first understand the structure of your application.
In python, you can access the JSON structure with with_record
methods and then calling
layout_calls_as_app
.
Example
response = my_llm_app(query)
from trulens.apps.langchain import TruChain
tru_recorder = TruChain(
my_llm_app,
app_name='ChatApplication',
app_version="Chain1",
)
response, tru_record = tru_recorder.with_record(my_llm_app, query)
json_like = tru_record.layout_calls_as_app()
If a selector looks like the below:
Example
Select.Record.app.combine_documents_chain._call
It can be accessed via the JSON-like via:
Example
json_like['app']['combine_documents_chain']['_call']
The application structure can also be viewed in the TruLens user interface.
You can view this structure on the Evaluations
page by scrolling down to the
Timeline
.
The top level record also contains these helper accessors
-
RecordInput = Record.main_input
-- points to the main input part of a Record. This is the first argument to the root method of an app (for LangChain Chains this is the__call__
method). -
RecordOutput = Record.main_output
-- points to the main output part of a Record. This is the output of the root method of an app (i.e.__call__
for LangChain Chains). -
RecordCalls = Record.app
-- points to the root of the app-structured mirror of calls in a record. See App-organized Calls Section above.
Multiple Inputs Per Argument¶
As in the f_context_relevance
example, a selector for a single argument may point
to more than one aspect of a record/app. These are specified using the slice or
lists in key/index positions. In that case, the feedback function is evaluated
multiple times, its outputs collected, and finally aggregated into a main
feedback result.
The collection of values for each argument of feedback implementation is collected and every combination of argument-to-value mapping is evaluated with a feedback definition. This may produce a large number of evaluations if more than one argument names multiple values. In the dashboard, all individual invocations of a feedback implementation are shown alongside the final aggregate result.
App/Record Organization (What can be selected)¶
The top level JSON attributes are defined by the class structures.
For a Record:
trulens.core.schema.Record
¶
Bases: SerialModel
, Hashable
The record of a single main method call.
Note
This class will be renamed to Trace
in the future.
Attributes¶
cost
class-attribute
instance-attribute
¶
Costs associated with the record.
ts
class-attribute
instance-attribute
¶
Timestamp of last update.
This is usually set whenever a record is changed in any way.
main_input
class-attribute
instance-attribute
¶
The app's main input.
main_output
class-attribute
instance-attribute
¶
The app's main output if there was no error.
main_error
class-attribute
instance-attribute
¶
The app's main error if there was an error.
calls
class-attribute
instance-attribute
¶
calls: List[RecordAppCall] = []
The collection of calls recorded.
Note that these can be converted into a json structure with the same paths
as the app that generated this record via layout_calls_as_app
.
Invariant: calls are ordered by .perf.end_time
.
experimental_otel_spans
class-attribute
instance-attribute
¶
EXPERIMENTAL(otel-tracing): OTEL spans representation of this record.
This will be filled in only if the otel-tracing experimental feature is enabled.
feedback_and_future_results
class-attribute
instance-attribute
¶
feedback_and_future_results: Optional[
List[Tuple[FeedbackDefinition, Future[FeedbackResult]]]
] = Field(None, exclude=True)
Map of feedbacks to the futures for of their results.
These are only filled for records that were just produced. This will not
be filled in when read from database. Also, will not fill in when using
FeedbackMode.DEFERRED
.
feedback_results
class-attribute
instance-attribute
¶
feedback_results: Optional[List[Future[FeedbackResult]]] = (
Field(None, exclude=True)
)
Only the futures part of the above for backwards compatibility.
feedback_results_as_completed
property
¶
feedback_results_as_completed: Iterable[FeedbackResult]
Generate feedback results as they are completed.
Wraps feedback_results in as_completed.
Functions¶
wait_for_feedback_results
¶
wait_for_feedback_results(
feedback_timeout: Optional[float] = None,
) -> Dict[FeedbackDefinition, FeedbackResult]
Wait for feedback results to finish.
PARAMETER | DESCRIPTION |
---|---|
feedback_timeout
|
Timeout in seconds for each feedback function. If
not given, will use the default timeout
|
RETURNS | DESCRIPTION |
---|---|
Dict[FeedbackDefinition, FeedbackResult]
|
A mapping of feedback functions to their results. |
get
¶
Get a value from the record using a path.
PARAMETER | DESCRIPTION |
---|---|
path
|
Path to the value.
TYPE:
|
layout_calls_as_app
¶
layout_calls_as_app() -> Munch
Layout the calls in this record into the structure that follows that of the app that created this record.
This uses the paths stored in each RecordAppCall which are paths into the app.
Note: We cannot create a validated AppDefinition class (or subclass) object here as the layout of records differ in these ways:
-
Records do not include anything that is not an instrumented method hence have most of the structure of a app missing.
-
Records have RecordAppCall as their leafs where method definitions would be in the AppDefinition structure.
For an App:
trulens.core.schema.AppDefinition
¶
Bases: WithClassInfo
, SerialModel
Serialized fields of an app here whereas App contains non-serialized fields.
Attributes¶
tru_class_info
instance-attribute
¶
tru_class_info: Class
Class information of this pydantic object for use in deserialization.
Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.
app_id
class-attribute
instance-attribute
¶
Unique identifier for this app.
Computed deterministically from app_name and app_version. Leaving it here for it to be dumped when serializing. Also making it read-only as it should not be changed after creation.
app_version
instance-attribute
¶
app_version: AppVersion
Version tag for this app. Default is "base".
feedback_definitions
class-attribute
instance-attribute
¶
feedback_definitions: Sequence[FeedbackDefinitionID] = []
Feedback functions to evaluate on each record.
feedback_mode
class-attribute
instance-attribute
¶
feedback_mode: FeedbackMode = WITH_APP_THREAD
How to evaluate feedback functions upon producing a record.
record_ingest_mode
instance-attribute
¶
record_ingest_mode: RecordIngestMode = record_ingest_mode
Mode of records ingestion.
root_class
instance-attribute
¶
root_class: Class
Class of the main instrumented object.
Ideally this would be a ClassVar but since we want to check this without instantiating the subclass of AppDefinition that would define it, we cannot use ClassVar.
root_callable
class-attribute
¶
root_callable: FunctionOrMethod
App's main method.
This is to be filled in by subclass.
initial_app_loader_dump
class-attribute
instance-attribute
¶
initial_app_loader_dump: Optional[SerialBytes] = None
Serialization of a function that loads an app.
Dump is of the initial app state before any invocations. This can be used to create a new session.
Warning
Experimental work in progress.
app_extra_json
instance-attribute
¶
app_extra_json: JSON
Info to store about the app and to display in dashboard.
This can be used even if app itself cannot be serialized. app_extra_json
,
then, can stand in place for whatever data the user might want to keep track
of about the app.
Functions¶
load
staticmethod
¶
load(obj, *args, **kwargs)
Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.
model_validate
classmethod
¶
model_validate(*args, **kwargs) -> Any
Deserialized a jsonized version of the app into the instance of the class it was serialized from.
Note
This process uses extra information stored in the jsonized object and handled by WithClassInfo.
continue_session
staticmethod
¶
continue_session(
app_definition_json: JSON, app: Any
) -> AppDefinition
Instantiate the given app
with the given state
app_definition_json
.
Warning
This is an experimental feature with ongoing work.
PARAMETER | DESCRIPTION |
---|---|
app_definition_json
|
The json serialized app.
TYPE:
|
app
|
The app to continue the session with.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AppDefinition
|
A new |
new_session
staticmethod
¶
new_session(
app_definition_json: JSON,
initial_app_loader: Optional[Callable] = None,
) -> AppDefinition
Create an app instance at the start of a session.
Warning
This is an experimental feature with ongoing work.
Create a copy of the json serialized app with the enclosed app being initialized to its initial state before any records are produced (i.e. blank memory).
get_loadable_apps
staticmethod
¶
get_loadable_apps()
Gets a list of all of the loadable apps.
Warning
This is an experimental feature with ongoing work.
This is those that have initial_app_loader_dump
set.
For your app, you can inspect the JSON-like structure by using the dict
method:
Example
json_like = ... # your app, extending App
print(json_like.dict())
Calls made by App Components¶
When evaluating a feedback function, Records are augmented with
app/component calls. For example, if the instrumented app
contains a component combine_docs_chain
then app.combine_docs_chain
will
contain calls to methods of this component. app.combine_docs_chain._call
will
contain a RecordAppCall
(see schema.py) with information about the inputs/outputs/metadata
regarding the _call
call to that component. Selecting this information is the
reason behind the Select.RecordCalls
alias.
You can inspect the components making up your app via the App
method
print_instrumented
.