📓 Context Filters¶
In this example you will learn how to use context filters, and experiment with different model sizes and deployment options for the guardrail including using SOTA large and smaller models from OpenAI; fast, small models running on Groq and a locally deployed model using Ollama.
# !pip install trulens trulens-providers-openai trulens-providers-litellm chromadb openai groq ollama
import os
os.environ["OPENAI_API_KEY"] = "sk-proj-..."
os.environ["GROQ_API_KEY"] = "gsk_..."
Get Data¶
In this case, we'll just initialize some simple text in the notebook.
context_chunk_1 = (
"The automotive supplier's production process involves several stages: raw material procurement, component manufacturing, assembly, and quality control. "
"Raw materials are sourced from certified suppliers and undergo rigorous testing. "
"Component manufacturing includes precision machining and automated assembly lines. "
"The final assembly integrates all components, followed by stringent quality control checks using advanced inspection technologies."
)
context_chunk_2 = (
"Our just-in-time (JIT) inventory system minimizes inventory costs while ensuring components are available exactly when needed. "
"This system relies on real-time inventory tracking and close coordination with suppliers. "
"Disruptions in the supply chain, such as delays in raw material delivery, can significantly impact production schedules and increase costs."
)
context_chunk_3 = (
"The global supply chain requires navigating various trade policies, tariffs, and geopolitical events. "
"We collaborate with logistics partners to ensure timely and cost-effective delivery of components. "
"Our supply chain team continuously monitors global events, such as trade disputes and natural disasters, to mitigate potential disruptions."
)
context_chunk_4 = (
"Sustainability is a core value at our company. "
"We source materials responsibly, minimize waste, and improve energy efficiency. "
"Our initiatives include using recycled materials, implementing energy-efficient manufacturing processes, and developing eco-friendly products. "
"We track our environmental impact through annual audits of indicators including material sourcing and waste production."
)
context_chunk_5 = (
"Technology is crucial in our operations. "
"We use advanced automation, artificial intelligence, and data analytics to optimize production processes, improve product quality, and reduce costs. "
"Blockchain technology is being explored to enhance transparency and traceability in our supply chain, ensuring authenticity and reducing fraud."
)
context_chunk_6 = (
"The COVID-19 pandemic highlighted the importance of supply chain resilience. "
"Measures implemented include diversifying our supplier base, increasing inventory levels of critical components, and investing in digital supply chain solutions. "
"These steps help us quickly adapt to disruptions and maintain continuous production."
)
context_chunk_7 = (
"Strong supplier relationships are essential to our success. "
"We collaborate closely with suppliers to ensure a steady flow of high-quality components. "
"Supplier performance is regularly evaluated on the KPIs: on-time delivery rate, quality, and cost. "
"The KPIs are evaluated on a weekly, monthly and quarterly basis. "
"Effective communication and collaboration are key to maintaining these relationships."
)
context_chunk_8 = (
"Cybersecurity is a top priority for our company. "
"As operations become more connected and reliant on digital technologies, the risk of cyberattacks increases. "
"We have implemented robust cybersecurity measures, including firewalls, encryption, and continuous monitoring, to protect our systems and data from potential threats."
)
Create Vector Store¶
Create a chromadb vector store in memory.
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
embedding_function = OpenAIEmbeddingFunction(
api_key=os.environ.get("OPENAI_API_KEY"),
model_name="text-embedding-ada-002",
)
chroma_client = chromadb.Client()
vector_store = chroma_client.get_or_create_collection(
name="Architecture", embedding_function=embedding_function
)
Populate the vector store.
vector_store.add("context_1", documents=context_chunk_1)
vector_store.add("context_2", documents=context_chunk_2)
vector_store.add("context_3", documents=context_chunk_3)
vector_store.add("context_4", documents=context_chunk_4)
vector_store.add("context_5", documents=context_chunk_5)
vector_store.add("context_6", documents=context_chunk_6)
vector_store.add("context_7", documents=context_chunk_7)
vector_store.add("context_8", documents=context_chunk_8)
Build RAG from scratch¶
Build a custom RAG from scratch, and add TruLens custom instrumentation.
from openai import OpenAI
oai_client = OpenAI()
from openai import OpenAI
oai_client = OpenAI()
class RAG:
def retrieve(self, query: str) -> list:
"""
Retrieve relevant text from vector store.
"""
results = vector_store.query(query_texts=query, n_results=5)
# Flatten the list of lists into a single list
return [doc for sublist in results["documents"] for doc in sublist]
def generate_completion(self, query: str, context_str: list) -> str:
"""
Generate answer from context.
"""
if len(context_str) == 0:
return "Sorry, I couldn't find an answer to your question."
completion = (
oai_client.chat.completions.create(
model="gpt-4o-mini",
temperature=0,
messages=[
{
"role": "user",
"content": f"We have provided context information below. \n"
f"---------------------\n"
f"{context_str}"
f"\n---------------------\n"
f"Then, given all of this information, please answer the question: {query}",
}
],
)
.choices[0]
.message.content
)
if completion:
return completion
else:
return "Did not find an answer."
def query(self, query: str) -> str:
context_str = self.retrieve(query=query)
completion = self.generate_completion(
query=query, context_str=context_str
)
return completion
rag = RAG()
Run the app¶
from IPython.display import display
response = rag.query("How often are environmental KPIs assessed?")
display(response)
Use guardrails¶
In addition to making informed iteration, we can also directly use feedback results as guardrails at inference time. In particular, here we show how to use the context relevance score as a guardrail to filter out irrelevant context before it gets passed to the LLM. This both reduces hallucination and improves efficiency.
To do so, we'll rebuild our RAG using the @context-filter decorator on the method we want to filter, and pass in the feedback function and threshold to use for guardrailing.
from trulens.core.guardrails.base import context_filter
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_4o_provider = OpenAI(model_engine="gpt-4o")
# Context relevance between question and each context chunk.
f_context_relevance_gpt4o = Feedback(openai_4o_provider.context_relevance)
class FilteredRAG(RAG):
@context_filter(
feedback=f_context_relevance_gpt4o,
threshold=0.4,
keyword_for_prompt="query",
)
def retrieve(self, query: str) -> list:
"""
Retrieve relevant text from vector store.
"""
results = vector_store.query(query_texts=query, n_results=5)
if "documents" in results and results["documents"]:
return [doc for sublist in results["documents"] for doc in sublist]
else:
return []
filtered_rag = FilteredRAG()
Run the app with context filters¶
filtered_rag.query("How often are environmental KPIs assessed?")
We can actually get better answers by providing only the most relevant information to the LLM.
Try a smaller guardrail¶
openai_4omini_provider = OpenAI(model_engine="gpt-4o-mini")
f_context_relevance_gpt4omini = Feedback(openai_4omini_provider.context_relevance)
class FilteredRAG(RAG):
@context_filter(
feedback=f_context_relevance_gpt4omini,
threshold=0.4,
keyword_for_prompt="query",
)
def retrieve(self, query: str) -> list:
"""
Retrieve relevant text from vector store.
"""
results = vector_store.query(query_texts=query, n_results=5)
if "documents" in results and results["documents"]:
return [doc for sublist in results["documents"] for doc in sublist]
else:
return []
filtered_rag = FilteredRAG()
filtered_rag.query("How often are environmental KPIs assessed?")
Howabout on Groq with Llama 3 - 8B?¶
from trulens.providers.litellm import LiteLLM
groq_llama3_8b_provider = LiteLLM("groq/llama3-8b-8192")
f_context_relevance_groqllama3_8b = Feedback(groq_llama3_8b_provider.context_relevance)
class FilteredRAG(RAG):
@context_filter(
feedback=f_context_relevance_groqllama3_8b,
threshold=0.75,
keyword_for_prompt="query",
)
def retrieve(self, query: str) -> list:
"""
Retrieve relevant text from vector store.
"""
results = vector_store.query(query_texts=query, n_results=5)
if "documents" in results and results["documents"]:
return [doc for sublist in results["documents"] for doc in sublist]
else:
return []
filtered_rag = FilteredRAG()
filtered_rag.query("How often are environmental KPIs assessed?")
Can we run the guardrails locally, say with ollama?¶
Yes, but a bit slower then with Groq's infra 😞
from trulens.providers.litellm import LiteLLM
ollama_provider = LiteLLM("ollama/llama3.1:8b")
f_context_relevance_ollama = Feedback(ollama_provider.context_relevance)
class FilteredRAG(RAG):
@context_filter(
feedback=f_context_relevance_ollama,
threshold=0.5,
keyword_for_prompt="query",
)
def retrieve(self, query: str) -> list:
"""
Retrieve relevant text from vector store.
"""
results = vector_store.query(query_texts=query, n_results=5)
if "documents" in results and results["documents"]:
return [doc for sublist in results["documents"] for doc in sublist]
else:
return []
filtered_rag = FilteredRAG()
filtered_rag.query("How often are environmental KPIs assessed?")