LlamaIndex Hybrid Retriever + Reranking + Guardrailsยถ
Hybrid Retrievers are a great way to combine the strengths of different retrievers. Combined with filtering and reranking, this can be especially powerful in retrieving only the most relevant context from multiple methods. TruLens can take us even farther to highlight the strengths of each component retriever along with measuring the success of the hybrid retriever.
Last, we'll show how guardrails are an alternative approach to achieving the same goal: passing only relevant context to the LLM.
This example walks through that process.
Setupยถ
# !pip install trulens llama_index llama-index-readers-file llama-index-llms-openai llama-index-retrievers-bm25 openai pypdf torch sentence-transformers
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
# Imports main tools:
from trulens.core import Feedback
from trulens.core import TruSession
from trulens.apps.llamaindex import TruLlama
session = TruSession()
session.reset_database()
Get dataยถ
!curl https://www.ipcc.ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf --output IPCC_AR6_WGII_Chapter03.pdf
Create indexยถ
from llama_index.core import SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.retrievers.bm25 import BM25Retriever
splitter = SentenceSplitter(chunk_size=1024)
# load documents
documents = SimpleDirectoryReader(
input_files=["IPCC_AR6_WGII_Chapter03.pdf"]
).load_data()
nodes = splitter.get_nodes_from_documents(documents)
# initialize storage context (by default it's in-memory)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
index = VectorStoreIndex(
nodes=nodes,
storage_context=storage_context,
)
Set up retrieversยถ
# retrieve the top 10 most similar nodes using embeddings
vector_retriever = VectorIndexRetriever(index)
# retrieve the top 2 most similar nodes using bm25
bm25_retriever = BM25Retriever.from_defaults(nodes=nodes, similarity_top_k=2)
Create Hybrid (Custom) Retrieverยถ
from llama_index.core.retrievers import BaseRetriever
class HybridRetriever(BaseRetriever):
def __init__(self, vector_retriever, bm25_retriever):
self.vector_retriever = vector_retriever
self.bm25_retriever = bm25_retriever
super().__init__()
def _retrieve(self, query, **kwargs):
bm25_nodes = self.bm25_retriever.retrieve(query, **kwargs)
vector_nodes = self.vector_retriever.retrieve(query, **kwargs)
# combine the two lists of nodes
all_nodes = []
node_ids = set()
for n in bm25_nodes + vector_nodes:
if n.node.node_id not in node_ids:
all_nodes.append(n)
node_ids.add(n.node.node_id)
return all_nodes
index.as_retriever(similarity_top_k=5)
hybrid_retriever = HybridRetriever(vector_retriever, bm25_retriever)
Set up rerankerยถ
from llama_index.core.postprocessor import SentenceTransformerRerank
reranker = SentenceTransformerRerank(top_n=2, model="BAAI/bge-reranker-base")
from llama_index.core.query_engine import RetrieverQueryEngine
query_engine = RetrieverQueryEngine.from_args(
retriever=hybrid_retriever, node_postprocessors=[reranker]
)
from trulens.dashboard import run_dashboard
run_dashboard(session, port=1234)
Initialize Context Relevance checksยถ
Include relevance checks for bm25, vector retrievers, hybrid retriever and the filtered hybrid retriever (after rerank and filter).
This requires knowing the feedback selector for each. You can find this path by logging a run of your application and examining the application traces on the Evaluations page.
Read more in our docs: https://www.trulens.org/trulens/evaluation/feedback_selectors/selecting_components/
import numpy as np
from trulens.core.schema import Select
from trulens.providers.openai import OpenAI
# Initialize provider class
openai = OpenAI()
bm25_context = Select.RecordCalls._retriever.bm25_retriever.retrieve.rets[
:
].node.text
vector_context = Select.RecordCalls._retriever.vector_retriever._retrieve.rets[
:
].node.text
hybrid_context = Select.RecordCalls._retriever.retrieve.rets[:].node.text
hybrid_context_filtered = (
Select.RecordCalls._node_postprocessors[0]
._postprocess_nodes.rets[:]
.node.text
)
# Question/statement relevance between question and each context chunk.
f_context_relevance_bm25 = (
Feedback(openai.context_relevance, name="BM25")
.on_input()
.on(bm25_context)
.aggregate(np.mean)
)
f_context_relevance_vector = (
Feedback(openai.context_relevance, name="Vector")
.on_input()
.on(vector_context)
.aggregate(np.mean)
)
f_context_relevance_hybrid = (
Feedback(openai.context_relevance, name="Hybrid")
.on_input()
.on(hybrid_context)
.aggregate(np.mean)
)
f_context_relevance_hybrid_filtered = (
Feedback(openai.context_relevance, name="Hybrid Filtered")
.on_input()
.on(hybrid_context_filtered)
.aggregate(np.mean)
)
Add feedbacksยถ
tru_recorder = TruLlama(
query_engine,
app_name="Hybrid Retriever Query Engine",
feedbacks=[
f_context_relevance_bm25,
f_context_relevance_vector,
f_context_relevance_hybrid,
f_context_relevance_hybrid_filtered,
],
)
with tru_recorder as recording:
response = query_engine.query(
"What is the impact of climate change on the ocean?"
)
Explore in a Dashboardยถ
from trulens.dashboard import run_dashboard
run_dashboard(session) # open a local streamlit app to explore
# stop_dashboard(session) # stop if needed
Feedback Guardrails: an alternative to reranking/filteringยถ
TruLens feedback functions can be used as context filters in place of reranking. This is great for cases when you don't want to deal with another model (the reranker) or in cases when the feedback function is better aligned to human scores than a reranker. Notably, this feedback function can be any model of your choice - this is a great use of small, lightweight models that don't add as much latency to your app.
To illustrate this, we'll set up a new query engine with only the hybrid retriever (no reranking).
query_engine = RetrieverQueryEngine.from_args(retriever=hybrid_retriever)
Then we'll set up a feedback function and wrap the query engine with TruLens' WithFeedbackFilterNodes
. This allows us to pass in any feedback function we'd like to use for filtering, even custom ones!
In this example, we're using LLM-as-judge context relevance, but a small local model could be used here as well.
from trulens.core.guardrails.llama import WithFeedbackFilterNodes
feedback = Feedback(openai.context_relevance)
filtered_query_engine = WithFeedbackFilterNodes(
query_engine, feedback=feedback, threshold=0.75
)
Set up for recordingยถ
Here we'll introduce one last variation of the context relevance feedback function, this one pointed at the returned source nodes from the query engine's synthesize
method. This will accurately capture which retrieved context gets past the filter and to the LLM.
hybrid_context_filtered = (
Select.Record.app.query_engine.synthesize.rets.source_nodes[:].node.text
)
f_context_relevance_afterguardrails = (
Feedback(openai.context_relevance, name="After guardrails")
.on_input()
.on(hybrid_context_filtered)
.aggregate(np.mean)
)
tru_recorder = TruLlama(
filtered_query_engine,
app_name="Hybrid Retriever Query Engine with Guardrails",
feedbacks=[f_context_relevance_afterguardrails],
)
with tru_recorder as recording:
response = filtered_query_engine.query(
"What is the impact of climate change on the ocean"
)