📓 TruLens Quickstart¶
In this quickstart you will create a RAG from scratch, trace the execution and get feedback on an LLM response.
For evaluation, we will leverage the "hallucination triad" of groundedness, context relevance and answer relevance.
# !pip install trulens trulens-providers-openai chromadb openai
import os
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = "sk-proj-..."
os.environ["TRULENS_OTEL_TRACING"] = "1"
Get Data¶
In this case, we'll just initialize some simple text in the notebook.
uw_info = """
The University of Washington, founded in 1861 in Seattle, is a public research university
with over 45,000 students across three campuses in Seattle, Tacoma, and Bothell.
As the flagship institution of the six public universities in Washington state,
UW encompasses over 500 buildings and 20 million square feet of space,
including one of the largest library systems in the world.
"""
wsu_info = """
Washington State University, commonly known as WSU, founded in 1890, is a public research university in Pullman, Washington.
With multiple campuses across the state, it is the state's second largest institution of higher education.
WSU is known for its programs in veterinary medicine, agriculture, engineering, architecture, and pharmacy.
"""
seattle_info = """
Seattle, a city on Puget Sound in the Pacific Northwest, is surrounded by water, mountains and evergreen forests, and contains thousands of acres of parkland.
It's home to a large tech industry, with Microsoft and Amazon headquartered in its metropolitan area.
The futuristic Space Needle, a legacy of the 1962 World's Fair, is its most iconic landmark.
"""
starbucks_info = """
Starbucks Corporation is an American multinational chain of coffeehouses and roastery reserves headquartered in Seattle, Washington.
As the world's largest coffeehouse chain, Starbucks is seen to be the main representation of the United States' second wave of coffee culture.
"""
newzealand_info = """
New Zealand is an island country located in the southwestern Pacific Ocean. It comprises two main landmasses—the North Island and the South Island—and over 700 smaller islands.
The country is known for its stunning landscapes, ranging from lush forests and mountains to beaches and lakes. New Zealand has a rich cultural heritage, with influences from
both the indigenous Māori people and European settlers. The capital city is Wellington, while the largest city is Auckland. New Zealand is also famous for its adventure tourism,
including activities like bungee jumping, skiing, and hiking.
"""
Create Vector Store¶
Create a chromadb vector store in memory.
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
embedding_function = OpenAIEmbeddingFunction(
api_key=os.environ.get("OPENAI_API_KEY"),
model_name="text-embedding-3-small",
)
chroma_client = chromadb.Client()
vector_store = chroma_client.get_or_create_collection(
name="Washington", embedding_function=embedding_function
)
Populate the vector store.
vector_store.add("uw_info", documents=uw_info)
vector_store.add("wsu_info", documents=wsu_info)
vector_store.add("seattle_info", documents=seattle_info)
vector_store.add("starbucks_info", documents=starbucks_info)
vector_store.add("newzealand_info", documents=newzealand_info)
Build RAG from scratch¶
Build a custom RAG from scratch, and add TruLens custom instrumentation.
from trulens.core import TruSession
session = TruSession()
session.reset_database()
from openai import OpenAI
from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes
oai_client = OpenAI()
class RAG:
def __init__(self, model_name: str = "gpt-4.1-mini"):
self.model_name = model_name
@instrument(
span_type=SpanAttributes.SpanType.RETRIEVAL,
attributes={
SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
},
)
def retrieve(self, query: str) -> list:
"""
Retrieve relevant text from vector store.
"""
results = vector_store.query(query_texts=query, n_results=4)
# Flatten the list of lists into a single list
return [doc for sublist in results["documents"] for doc in sublist]
@instrument(span_type=SpanAttributes.SpanType.GENERATION)
def generate_completion(self, query: str, context_str: list) -> str:
"""
Generate answer from context.
"""
if len(context_str) == 0:
return "Sorry, I couldn't find an answer to your question."
completion = (
oai_client.chat.completions.create(
model=self.model_name,
temperature=0,
messages=[
{
"role": "user",
"content": f"We have provided context information below. \n"
f"---------------------\n"
f"{context_str}"
f"\n---------------------\n"
f"First, say hello and that you're happy to help. \n"
f"\n---------------------\n"
f"Then, given this information, please answer the question: {query}",
}
],
)
.choices[0]
.message.content
)
if completion:
return completion
else:
return "Did not find an answer."
@instrument(
span_type=SpanAttributes.SpanType.RECORD_ROOT,
attributes={
SpanAttributes.RECORD_ROOT.INPUT: "query",
SpanAttributes.RECORD_ROOT.OUTPUT: "return",
},
)
def query(self, query: str) -> str:
context_str = self.retrieve(query=query)
completion = self.generate_completion(
query=query, context_str=context_str
)
return completion
Feedback functions¶
import numpy as np
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
provider = OpenAI(model_engine="gpt-4.1-mini")
# Define a groundedness feedback function
f_groundedness = (
Feedback(
provider.groundedness_measure_with_cot_reasons, name="Groundedness"
)
.on_context(collect_list=True)
.on_output()
)
# Question/answer relevance between overall question and answer.
f_answer_relevance = (
Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
.on_input()
.on_output()
)
# Context relevance between question and each context chunk.
f_context_relevance = (
Feedback(
provider.context_relevance_with_cot_reasons, name="Context Relevance"
)
.on_input()
.on_context(collect_list=False)
.aggregate(np.mean) # choose a different aggregation method if you wish
)
Construct the app¶
Wrap the custom RAG with TruApp, add list of feedbacks for eval
from trulens.apps.app import TruApp
rag = RAG(model_name="gpt-4.1-mini")
tru_rag = TruApp(
rag,
app_name="OTEL-RAG",
app_version="4.1-mini",
feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance],
)
Run the app¶
Use tru_rag
as a context manager for the custom RAG-from-scratch app.
with tru_rag as recording:
rag.query(
"What wave of coffee culture is Starbucks seen to represent in the United States?"
)
rag.query(
"What wave of coffee culture is Starbucks seen to represent in the New Zealand?"
)
rag.query("Does Washington State have Starbucks on campus?")
Check results¶
We can view results in the leaderboard.
session.get_leaderboard()
from trulens.dashboard import run_dashboard
run_dashboard(session)
Use guardrails¶
In addition to making informed iteration, we can also directly use feedback results as guardrails at inference time. In particular, here we show how to use the context relevance score as a guardrail to filter out irrelevant context before it gets passed to the LLM. This both reduces hallucination and improves efficiency.
To do so, we'll rebuild our RAG using the @context-filter decorator on the method we want to filter, and pass in the feedback function and threshold to use for guardrailing.
from trulens.core.guardrails.base import context_filter
guardrail_provider = OpenAI(model_engine="gpt-4.1-mini")
# note: feedback function used for guardrail must only return a score, not also reasons
f_context_relevance_score = Feedback(
guardrail_provider.context_relevance, name="Context Relevance"
)
class FilteredRAG(RAG):
@instrument
@context_filter(
feedback=f_context_relevance_score,
threshold=0.75,
keyword_for_prompt="query",
)
def retrieve(self, query: str) -> list:
"""
Retrieve relevant text from vector store.
"""
results = vector_store.query(query_texts=query, n_results=4)
if "documents" in results and results["documents"]:
return [doc for sublist in results["documents"] for doc in sublist]
else:
return []
filtered_rag = FilteredRAG()
filtered_tru_rag = TruApp(
filtered_rag,
app_name="RAG",
app_version="filtered",
feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance],
)
Record and invoke the app as normal¶
with filtered_tru_rag as recording:
filtered_rag.query(
query="What wave of coffee culture is Starbucks seen to represent in the United States?"
)
filtered_rag.query(
"What wave of coffee culture is Starbucks seen to represent in the New Zealand?"
)
filtered_rag.query("Does Washington State have Starbucks on campus?")
Check results¶
session.get_leaderboard()