📓 TruLens with Outside Logs¶
If your application was run (and logged) outside of TruLens, TruVirtual can be used to ingest and evaluate the logs.
The first step to loading your app logs into TruLens is creating a virtual app. This virtual app can be a plain dictionary or use our VirtualApp class to store any information you would like. You can refer to these values for evaluating feedback.
# !pip install trulens trulens-providers-openai openai
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
from trulens.apps.virtual import VirtualApp
from trulens.core import Select
virtual_app = dict(
llm=dict(modelname="some llm component model name"),
template="information about the template I used in my app",
debug="all of these fields are completely optional",
)
virtual_app = VirtualApp(virtual_app) # can start with the prior dictionary
virtual_app[Select.RecordCalls.llm.maxtokens] = 1024
When setting up the virtual app, you should also include any components that you would like to evaluate in the virtual app. This can be done using the Select class. Using selectors here lets use reuse the setup you use to define feedback functions. Below you can see how to set up a virtual app with a retriever component, which will be used later in the example for feedback evaluation.
retriever = Select.RecordCalls.retriever
synthesizer = Select.RecordCalls.synthesizer
virtual_app[retriever] = "retriever"
virtual_app[synthesizer] = "synthesizer"
import datetime
from trulens.apps.virtual import VirtualRecord
# The selector for a presumed context retrieval component's call to
# `get_context`. The names are arbitrary but may be useful for readability on
# your end.
context_call = retriever.get_context
generation = synthesizer.generate
rec1 = VirtualRecord(
main_input="Where is Germany?",
main_output="Germany is in Europe",
calls={
context_call: dict(
args=["Where is Germany?"],
rets=["Germany is a country located in Europe."],
),
generation: dict(
args=[
"""
We have provided the below context: \n
---------------------\n
Germany is a country located in Europe.
---------------------\n
Given this information, please answer the question:
Where is Germany?
"""
],
rets=["Germany is a country located in Europe."],
),
},
)
# set usage and cost information for a record with the cost attribute
rec1.cost.n_tokens = 234
rec1.cost.cost = 0.05
# set start and end times with the perf attribute
start_time = datetime.datetime(
2024, 6, 12, 10, 30, 0
) # June 12th, 2024 at 10:30:00 AM
end_time = datetime.datetime(
2024, 6, 12, 10, 31, 30
) # June 12th, 2024 at 12:31:30 PM
rec1.perf.start_time = start_time
rec1.perf.end_time = end_time
rec2 = VirtualRecord(
main_input="Where is Germany?",
main_output="Poland is in Europe",
calls={
context_call: dict(
args=["Where is Germany?"],
rets=["Poland is a country located in Europe."],
),
generation: dict(
args=[
"""
We have provided the below context: \n
---------------------\n
Germany is a country located in Europe.
---------------------\n
Given this information, please answer the question:
Where is Germany?
"""
],
rets=["Poland is a country located in Europe."],
),
},
)
data = [rec1, rec2]
Now that we've ingested constructed the virtual records, we can build our feedback functions. This is done just the same as normal, except the context selector will instead refer to the new context_call we added to the virtual record.
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
# Initialize provider class
provider = OpenAI()
# Select context to be used in feedback. We select the return values of the
# virtual `get_context` call in the virtual `retriever` component. Names are
# arbitrary except for `rets`.
context = context_call.rets[:]
# Question/statement relevance between question and each context chunk.
f_context_relevance = (
Feedback(provider.context_relevance_with_cot_reasons).on_input().on(context)
)
# Define a groundedness feedback function
f_groundedness = (
Feedback(
provider.groundedness_measure_with_cot_reasons, name="Groundedness"
)
.on(context.collect())
.on_output()
)
# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(
provider.relevance_with_cot_reasons, name="Answer Relevance"
).on_input_output()
Set up the virtual recorder¶
Here, we'll use deferred mode. This way you can see the records in the dashboard before we've run evaluations.
from trulens.apps.virtual import TruVirtual
virtual_recorder = TruVirtual(
app_name="a virtual app",
app=virtual_app,
feedbacks=[f_context_relevance, f_groundedness, f_qa_relevance],
feedback_mode="deferred", # optional
)
for record in data:
virtual_recorder.add_record(record)
from trulens.core import TruSession
from trulens.dashboard import run_dashboard
session = TruSession()
run_dashboard(session)
Then, you can start the evaluator at a time of your choosing.
session.start_evaluator()
# session.stop_evaluator() # stop if needed