📓 Add Dataframe Quickstart¶
If your application was run (and logged) outside of TruLens, TruVirtual can be used to ingest and evaluate the logs.
This notebook walks through how to quickly log a dataframe of prompts, responses and contexts (optional) to TruLens as traces, and how to run evaluations with the trace data.
# !pip install trulens trulens-providers-openai openai
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
Create or load a dataframe¶
The dataframe should include minimally columns named query
and response
. You can also include a column named contexts
if you wish to evaluate retrieval systems or RAGs.
import pandas as pd
data = {
"query": ["Where is Germany?", "What is the capital of France?"],
"response": ["Germany is in Europe", "The capital of France is Paris"],
"contexts": [
["Germany is a country located in Europe."],
[
"France is a country in Europe and its capital is Paris.",
"Germany is a country located in Europe",
],
],
}
df = pd.DataFrame(data)
df.head()
Create a virtual app for tracking purposes.¶
This can be initialized simply, or you can track application metadata by passing a dict
to VirtualApp()
. For simplicity, we'll leave it empty here.
from trulens.apps.virtual import VirtualApp
virtual_app = VirtualApp()
Next, let's define feedback functions.
The add_dataframe
method we plan to use will load the prompt, context and response into virtual records. We should define our feedback functions to access this data in the structure it will be stored. We can do so as follows:
- prompt: selected using
.on_input()
- response: selected using
on_output()
- context: selected using
VirtualApp.select_context()
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
# Initialize provider class
provider = OpenAI()
# Select context to be used in feedback.
context = VirtualApp.select_context()
# Question/statement relevance between question and each context chunk.
f_context_relevance = (
Feedback(
provider.context_relevance_with_cot_reasons, name="Context Relevance"
)
.on_input()
.on(context)
)
# Define a groundedness feedback function
f_groundedness = (
Feedback(
provider.groundedness_measure_with_cot_reasons, name="Groundedness"
)
.on(context.collect())
.on_output()
)
# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(
provider.relevance_with_cot_reasons, name="Answer Relevance"
).on_input_output()
Start a TruLens logging session¶
from trulens.core import TruSession
from trulens.dashboard import run_dashboard
session = TruSession()
run_dashboard(session)
Register the virtual app¶
We can now register our virtual app, including any feedback functions we'd like to use for evaluation.
from trulens.apps.virtual import TruVirtual
virtual_recorder = TruVirtual(
app_name="RAG",
app_version="simple",
app=virtual_app,
feedbacks=[f_context_relevance, f_groundedness, f_qa_relevance],
)
Add the dataframe to TruLens¶
We can then add the dataframe to TruLens using the virual recorder method add_dataframe
. Doing so will immediately log the traces, and kick off the computation of evaluations. After some time, the evaluation results will be accessible both from the sdk (e.g. session.get_leaderboard
) and in the TruLens dashboard.
If you wish to skip evaluations and only log traces, you can simply skip the sections of this notebook where feedback functions are defined, and exclude them from the construction of the virtual_recorder
.
virtual_records = virtual_recorder.add_dataframe(df)