Vectara HHEM Evaluator Quickstartยถ
In this quickstart, you'll learn how to use the HHEM evaluator feedback function from TruLens in your application. The Vectra HHEM evaluator, or Hughes Hallucination Evaluation Model, is a tool used to determine if a summary produced by a large language model (LLM) might contain hallucinated information.
- Purpose: The Vectra HHEM evaluator analyzes both inputs and assigns a score indicating the probability of response containing hallucinations.
- Score : The returned value is a floating point number between zero and one that represents a boolean outcome : either a high likelihood of hallucination if the score is less than 0.5 or a low likelihood of hallucination if the score is more than 0.5
Install Dependenciesยถ
Run the cells below to install the utilities we'll use in this notebook to demonstrate Vectara's HHEM model.
- uncomment the cell below if you haven't yet installed the langchain or TruEra's TruLens.
# !pip install trulens trulens-providers-huggingface 'langchain==0.0.354' 'langchain-community==0.0.20' 'langchain-core==0.1.23'
Import Utilitiesยถ
we're using LangChain utilities to facilitate RAG retrieval and demonstrate Vectara's HHEM.
- run the cells below to get started.
import getpass
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
PreProcess Your Dataยถ
Run the cells below to split the Document TEXT into text Chunks to feed in ChromaDb. These are our primary sources for evaluation.
loader = DirectoryLoader("./data/", glob="./*.txt", loader_cls=TextLoader)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=50
)
texts = text_splitter.split_documents(documents)
e5 Embeddingsยถ
e5 embeddings set the SOTA on BEIR and MTEB benchmarks by using only synthetic data and less than 1k training steps. this method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data. Furthermore, when fine-tuned with a mixture of synthetic and labeled data, this model sets new state-of-the-art results on the BEIR and MTEB benchmarks.Improving Text Embeddings with Large Language Models. It also requires a unique prompting mechanism.
inference_api_key = getpass.getpass("Enter your HF Inference API Key:\n\n")
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
embedding_function = HuggingFaceInferenceAPIEmbeddings(
api_key=inference_api_key,
model_name="intfloat/multilingual-e5-large-instruct",
)
Initialize a Vector Storeยถ
Here we're using Chroma , our standard solution for all vector store requirements.
- run the cells below to initialize the vector store.
db = Chroma.from_documents(texts, embedding_function)
Wrap a Simple RAG application with TruLensยถ
- Retrieval: to get relevant docs from vector DB
- Generate completions: to get response from LLM.
run the cells below to create a RAG Class and Functions to Record the Context and LLM Response for Evaluation
import requests
from trulens.apps.custom import instrument
class Rag:
def __init__(self):
pass
@instrument
def retrieve(self, query: str) -> str:
docs = db.similarity_search(query)
# Concatenate the content of the documents
content = "".join(doc.page_content for doc in docs)
return content
@instrument
def generate_completion(self, content: str, query: str) -> str:
url = "https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"
headers = {
"Authorization": "Bearer your hf token",
"Content-Type": "application/json",
}
data = {
"inputs": f"answer the following question from the information given Question:{query}\nInformation:{content}\n"
}
try:
response = requests.post(url, headers=headers, json=data)
response.raise_for_status()
response_data = response.json()
# Extract the generated text from the response
generated_text = response_data[0]["generated_text"]
# Remove the input text from the generated text
response_text = generated_text[len(data["inputs"]) :]
return response_text
except requests.exceptions.RequestException as e:
print("Error:", e)
return None
@instrument
def query(self, query: str) -> str:
context_str = self.retrieve(query)
completion = self.generate_completion(context_str, query)
return completion
Instantiate the applications aboveยถ
- run the cells below to start the applications above.
rag1 = Rag()
from trulens.core import Feedback
from trulens.core import Select
from trulens.core import TruSession
from trulens.providers.huggingface import Huggingface
session = TruSession()
session.reset_database()
Initialize HHEM Feedback Functionยถ
HHEM takes two inputs:
- The summary/answer itself generated by LLM.
- The original source text that the LLM used to generate the summary/answer (retrieval context).
huggingface_provider = Huggingface()
f_hhem_score = (
Feedback(huggingface_provider.hallucination_evaluator, name="HHEM_Score")
.on(Select.RecordCalls.generate_completion.rets)
.on(Select.RecordCalls.retrieve.rets)
)
Record The HHEM Scoreยถ
- run the cell below to create a feedback function for Vectara's HHEM model's score.
feedbacks = [f_hhem_score]
Wrap the custom RAG with TruCustomApp, add HHEM feedback for evaluationยถ
- it's as simple as running the cell below to complete the application and feedback wrapper.
from trulens.apps.custom import TruCustomApp
tru_rag = TruCustomApp(rag1, app_name="RAG", app_version="v1", feedbacks=feedbacks)
Run the Appยถ
with tru_rag as recording:
rag1.query("What is Vint Cerf")
session.get_leaderboard(app_ids=[tru_rag.app_id])
Explore in a Dashboardยถ
from trulens.dashboard import run_dashboard
run_dashboard(session)