Local vs Remote Huggingface Feedback Functions¶
In this quickstart you will create a RAG from scratch and compare local vs remote Huggingface feedback functions.
In [ ]:
Copied!
# !pip install trulens trulens-providers-huggingface chromadb openai torch transformers sentencepiece
# !pip install trulens trulens-providers-huggingface chromadb openai torch transformers sentencepiece
In [ ]:
Copied!
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
Get Data¶
In this case, we'll just initialize some simple text in the notebook.
In [ ]:
Copied!
uw_info = """
The University of Washington, founded in 1861 in Seattle, is a public research university
with over 45,000 students across three campuses in Seattle, Tacoma, and Bothell.
As the flagship institution of the six public universities in Washington state,
UW encompasses over 500 buildings and 20 million square feet of space,
including one of the largest library systems in the world.
"""
wsu_info = """
Washington State University, commonly known as WSU, founded in 1890, is a public research university in Pullman, Washington.
With multiple campuses across the state, it is the state's second largest institution of higher education.
WSU is known for its programs in veterinary medicine, agriculture, engineering, architecture, and pharmacy.
"""
seattle_info = """
Seattle, a city on Puget Sound in the Pacific Northwest, is surrounded by water, mountains and evergreen forests, and contains thousands of acres of parkland.
It's home to a large tech industry, with Microsoft and Amazon headquartered in its metropolitan area.
The futuristic Space Needle, a legacy of the 1962 World's Fair, is its most iconic landmark.
"""
starbucks_info = """
Starbucks Corporation is an American multinational chain of coffeehouses and roastery reserves headquartered in Seattle, Washington.
As the world's largest coffeehouse chain, Starbucks is seen to be the main representation of the United States' second wave of coffee culture.
"""
uw_info = """
The University of Washington, founded in 1861 in Seattle, is a public research university
with over 45,000 students across three campuses in Seattle, Tacoma, and Bothell.
As the flagship institution of the six public universities in Washington state,
UW encompasses over 500 buildings and 20 million square feet of space,
including one of the largest library systems in the world.
"""
wsu_info = """
Washington State University, commonly known as WSU, founded in 1890, is a public research university in Pullman, Washington.
With multiple campuses across the state, it is the state's second largest institution of higher education.
WSU is known for its programs in veterinary medicine, agriculture, engineering, architecture, and pharmacy.
"""
seattle_info = """
Seattle, a city on Puget Sound in the Pacific Northwest, is surrounded by water, mountains and evergreen forests, and contains thousands of acres of parkland.
It's home to a large tech industry, with Microsoft and Amazon headquartered in its metropolitan area.
The futuristic Space Needle, a legacy of the 1962 World's Fair, is its most iconic landmark.
"""
starbucks_info = """
Starbucks Corporation is an American multinational chain of coffeehouses and roastery reserves headquartered in Seattle, Washington.
As the world's largest coffeehouse chain, Starbucks is seen to be the main representation of the United States' second wave of coffee culture.
"""
Create Vector Store¶
Create a chromadb vector store in memory.
In [ ]:
Copied!
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
embedding_function = OpenAIEmbeddingFunction(
api_key=os.environ.get("OPENAI_API_KEY"),
model_name="text-embedding-ada-002",
)
chroma_client = chromadb.Client()
vector_store = chroma_client.get_or_create_collection(
name="Washington", embedding_function=embedding_function
)
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
embedding_function = OpenAIEmbeddingFunction(
api_key=os.environ.get("OPENAI_API_KEY"),
model_name="text-embedding-ada-002",
)
chroma_client = chromadb.Client()
vector_store = chroma_client.get_or_create_collection(
name="Washington", embedding_function=embedding_function
)
Populate the vector store.
In [ ]:
Copied!
vector_store.add("uw_info", documents=uw_info)
vector_store.add("wsu_info", documents=wsu_info)
vector_store.add("seattle_info", documents=seattle_info)
vector_store.add("starbucks_info", documents=starbucks_info)
vector_store.add("uw_info", documents=uw_info)
vector_store.add("wsu_info", documents=wsu_info)
vector_store.add("seattle_info", documents=seattle_info)
vector_store.add("starbucks_info", documents=starbucks_info)
Build RAG from scratch¶
Build a custom RAG from scratch, and add TruLens custom instrumentation.
In [ ]:
Copied!
from trulens.core import TruSession
from trulens.apps.custom import instrument
session = TruSession()
session.reset_database()
from trulens.core import TruSession
from trulens.apps.custom import instrument
session = TruSession()
session.reset_database()
In [ ]:
Copied!
from openai import OpenAI
oai_client = OpenAI()
class RAG_from_scratch:
@instrument
def retrieve(self, query: str) -> list:
"""
Retrieve relevant text from vector store.
"""
results = vector_store.query(query_texts=query, n_results=4)
# Flatten the list of lists into a single list
return [doc for sublist in results["documents"] for doc in sublist]
@instrument
def generate_completion(self, query: str, context_str: list) -> str:
"""
Generate answer from context.
"""
completion = (
oai_client.chat.completions.create(
model="gpt-3.5-turbo",
temperature=0,
messages=[
{
"role": "user",
"content": f"We have provided context information below. \n"
f"---------------------\n"
f"{context_str}"
f"\n---------------------\n"
f"Given this information, please answer the question: {query}",
}
],
)
.choices[0]
.message.content
)
return completion
@instrument
def query(self, query: str) -> str:
context_str = self.retrieve(query)
completion = self.generate_completion(query, context_str)
return completion
rag = RAG_from_scratch()
from openai import OpenAI
oai_client = OpenAI()
class RAG_from_scratch:
@instrument
def retrieve(self, query: str) -> list:
"""
Retrieve relevant text from vector store.
"""
results = vector_store.query(query_texts=query, n_results=4)
# Flatten the list of lists into a single list
return [doc for sublist in results["documents"] for doc in sublist]
@instrument
def generate_completion(self, query: str, context_str: list) -> str:
"""
Generate answer from context.
"""
completion = (
oai_client.chat.completions.create(
model="gpt-3.5-turbo",
temperature=0,
messages=[
{
"role": "user",
"content": f"We have provided context information below. \n"
f"---------------------\n"
f"{context_str}"
f"\n---------------------\n"
f"Given this information, please answer the question: {query}",
}
],
)
.choices[0]
.message.content
)
return completion
@instrument
def query(self, query: str) -> str:
context_str = self.retrieve(query)
completion = self.generate_completion(query, context_str)
return completion
rag = RAG_from_scratch()
Set up feedback functions.¶
Here we'll use groundedness for both local and remote Huggingface feedback functions.
In [ ]:
Copied!
from trulens.core import Feedback
from trulens.core import Select
from trulens.providers.huggingface import Huggingface
from trulens.providers.huggingface import HuggingfaceLocal
# Define a local Huggingface groundedness feedback function
local_provider = HuggingfaceLocal()
f_local_groundedness = (
Feedback(
local_provider.groundedness_measure_with_nli,
name="[Local] Groundedness",
)
.on(Select.RecordCalls.retrieve.rets.collect())
.on_output()
)
# Define a remote Huggingface groundedness feedback function
remote_provider = Huggingface()
f_remote_groundedness = (
Feedback(
remote_provider.groundedness_measure_with_nli,
name="[Remote] Groundedness",
)
.on(Select.RecordCalls.retrieve.rets.collect())
.on_output()
)
from trulens.core import Feedback
from trulens.core import Select
from trulens.providers.huggingface import Huggingface
from trulens.providers.huggingface import HuggingfaceLocal
# Define a local Huggingface groundedness feedback function
local_provider = HuggingfaceLocal()
f_local_groundedness = (
Feedback(
local_provider.groundedness_measure_with_nli,
name="[Local] Groundedness",
)
.on(Select.RecordCalls.retrieve.rets.collect())
.on_output()
)
# Define a remote Huggingface groundedness feedback function
remote_provider = Huggingface()
f_remote_groundedness = (
Feedback(
remote_provider.groundedness_measure_with_nli,
name="[Remote] Groundedness",
)
.on(Select.RecordCalls.retrieve.rets.collect())
.on_output()
)
Construct the app¶
Wrap the custom RAG with TruCustomApp, add list of feedbacks for eval
In [ ]:
Copied!
from trulens.apps.custom import TruCustomApp
tru_rag = TruCustomApp(
rag,
app_name="RAG",
app_version="v1",
feedbacks=[f_local_groundedness, f_remote_groundedness],
)
from trulens.apps.custom import TruCustomApp
tru_rag = TruCustomApp(
rag,
app_name="RAG",
app_version="v1",
feedbacks=[f_local_groundedness, f_remote_groundedness],
)
Run the app¶
Use tru_rag
as a context manager for the custom RAG-from-scratch app.
In [ ]:
Copied!
with tru_rag as recording:
rag.query("When was the University of Washington founded?")
with tru_rag as recording:
rag.query("When was the University of Washington founded?")
Check results¶
We can view results in the leaderboard.
In [ ]:
Copied!
from trulens.dashboard.display import get_feedback_result
last_record = recording.records[-1]
get_feedback_result(last_record, "[Local] Groundedness")
from trulens.dashboard.display import get_feedback_result
last_record = recording.records[-1]
get_feedback_result(last_record, "[Local] Groundedness")
In [ ]:
Copied!
get_feedback_result(last_record, "[Remote] Groundedness")
get_feedback_result(last_record, "[Remote] Groundedness")
In [ ]:
Copied!
session.get_leaderboard()
session.get_leaderboard()