Vectara HHEM Evaluator Quickstart¶

In this quickstart, you'll learn how to use the HHEM evaluator feedback function from TruLens in your application. The Vectra HHEM evaluator, or Hughes Hallucination Evaluation Model, is a tool used to determine if a summary produced by a large language model (LLM) might contain hallucinated information.

Purpose: The Vectra HHEM evaluator analyzes both inputs and assigns a score indicating the probability of response containing hallucinations.
Score : The returned value is a floating point number between zero and one that represents a boolean outcome : either a high likelihood of hallucination if the score is less than 0.5 or a low likelihood of hallucination if the score is more than 0.5

Install Dependencies¶

Run the cells below to install the utilities we'll use in this notebook to demonstrate Vectara's HHEM model.

uncomment the cell below if you haven't yet installed the langchain or TruEra's TruLens.

In [ ]:

Copied!

# !pip install trulens trulens-providers-huggingface 'langchain==0.0.354' 'langchain-community==0.0.20' 'langchain-core==0.1.23'
# !pip install trulens trulens-providers-huggingface 'langchain==0.0.354' 'langchain-community==0.0.20' 'langchain-core==0.1.23'

Import Utilities¶

we're using LangChain utilities to facilitate RAG retrieval and demonstrate Vectara's HHEM.

run the cells below to get started.

In [ ]:

Copied!





import getpass

from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
import getpass

from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

PreProcess Your Data¶

Run the cells below to split the Document TEXT into text Chunks to feed in ChromaDb. These are our primary sources for evaluation.

In [ ]:

Copied!





loader = DirectoryLoader("./data/", glob="./*.txt", loader_cls=TextLoader)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=50
)
texts = text_splitter.split_documents(documents)
loader = DirectoryLoader("./data/", glob="./*.txt", loader_cls=TextLoader)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=50
)
texts = text_splitter.split_documents(documents)

e5 Embeddings¶

e5 embeddings set the SOTA on BEIR and MTEB benchmarks by using only synthetic data and less than 1k training steps. this method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data. Furthermore, when fine-tuned with a mixture of synthetic and labeled data, this model sets new state-of-the-art results on the BEIR and MTEB benchmarks.Improving Text Embeddings with Large Language Models. It also requires a unique prompting mechanism.

In [ ]:

Copied!

inference_api_key = getpass.getpass("Enter your HF Inference API Key:\n\n")
inference_api_key = getpass.getpass("Enter your HF Inference API Key:\n\n")

In [ ]:

Copied!





from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

embedding_function = HuggingFaceInferenceAPIEmbeddings(
    api_key=inference_api_key,
    model_name="intfloat/multilingual-e5-large-instruct",
)
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

embedding_function = HuggingFaceInferenceAPIEmbeddings(
    api_key=inference_api_key,
    model_name="intfloat/multilingual-e5-large-instruct",
)

Initialize a Vector Store¶

Here we're using Chroma , our standard solution for all vector store requirements.

run the cells below to initialize the vector store.

In [ ]:

Copied!

db = Chroma.from_documents(texts, embedding_function)
db = Chroma.from_documents(texts, embedding_function)

Wrap a Simple RAG application with TruLens¶

Retrieval: to get relevant docs from vector DB
Generate completions: to get response from LLM.

run the cells below to create a RAG Class and Functions to Record the Context and LLM Response for Evaluation

In [ ]:

Copied!





import requests
from trulens.apps.custom import instrument


class Rag:
    def __init__(self):
        pass

    @instrument
    def retrieve(self, query: str) -> str:
        docs = db.similarity_search(query)
        # Concatenate the content of the documents
        content = "".join(doc.page_content for doc in docs)
        return content

    @instrument
    def generate_completion(self, content: str, query: str) -> str:
        url = "https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"
        headers = {
            "Authorization": "Bearer your hf token",
            "Content-Type": "application/json",
        }

        data = {
            "inputs": f"answer the following question from the information given Question:{query}\nInformation:{content}\n"
        }

        try:
            response = requests.post(url, headers=headers, json=data)
            response.raise_for_status()
            response_data = response.json()

            # Extract the generated text from the response
            generated_text = response_data[0]["generated_text"]
            # Remove the input text from the generated text
            response_text = generated_text[len(data["inputs"]) :]

            return response_text
        except requests.exceptions.RequestException as e:
            print("Error:", e)
            return None

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve(query)
        completion = self.generate_completion(context_str, query)
        return completion
import requests
from trulens.apps.custom import instrument


class Rag:
    def __init__(self):
        pass

    @instrument
    def retrieve(self, query: str) -> str:
        docs = db.similarity_search(query)
        # Concatenate the content of the documents
        content = "".join(doc.page_content for doc in docs)
        return content

    @instrument
    def generate_completion(self, content: str, query: str) -> str:
        url = "https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"
        headers = {
            "Authorization": "Bearer your hf token",
            "Content-Type": "application/json",
        }

        data = {
            "inputs": f"answer the following question from the information given Question:{query}\nInformation:{content}\n"
        }

        try:
            response = requests.post(url, headers=headers, json=data)
            response.raise_for_status()
            response_data = response.json()

            # Extract the generated text from the response
            generated_text = response_data[0]["generated_text"]
            # Remove the input text from the generated text
            response_text = generated_text[len(data["inputs"]) :]

            return response_text
        except requests.exceptions.RequestException as e:
            print("Error:", e)
            return None

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve(query)
        completion = self.generate_completion(context_str, query)
        return completion

Instantiate the applications above¶

run the cells below to start the applications above.

In [ ]:

Copied!

rag1 = Rag()
rag1 = Rag()

In [ ]:

Copied!





from trulens.core import Feedback
from trulens.core import Select
from trulens.core import TruSession
from trulens.providers.huggingface import Huggingface

session = TruSession()
session.reset_database()
from trulens.core import Feedback
from trulens.core import Select
from trulens.core import TruSession
from trulens.providers.huggingface import Huggingface

session = TruSession()
session.reset_database()

Initialize HHEM Feedback Function¶

HHEM takes two inputs:

The summary/answer itself generated by LLM.
The original source text that the LLM used to generate the summary/answer (retrieval context).

In [ ]:

Copied!





huggingface_provider = Huggingface()
f_hhem_score = (
    Feedback(huggingface_provider.hallucination_evaluator, name="HHEM_Score")
    .on(Select.RecordCalls.generate_completion.rets)
    .on(Select.RecordCalls.retrieve.rets)
)
huggingface_provider = Huggingface()
f_hhem_score = (
    Feedback(huggingface_provider.hallucination_evaluator, name="HHEM_Score")
    .on(Select.RecordCalls.generate_completion.rets)
    .on(Select.RecordCalls.retrieve.rets)
)

Record The HHEM Score¶

run the cell below to create a feedback function for Vectara's HHEM model's score.

In [ ]:

Copied!

feedbacks = [f_hhem_score]
feedbacks = [f_hhem_score]

Wrap the custom RAG with TruCustomApp, add HHEM feedback for evaluation¶

it's as simple as running the cell below to complete the application and feedback wrapper.

In [ ]:

Copied!

from trulens.apps.custom import TruCustomApp

tru_rag = TruCustomApp(rag1, app_name="RAG", app_version="v1", feedbacks=feedbacks)
from trulens.apps.custom import TruCustomApp

tru_rag = TruCustomApp(rag1, app_name="RAG", app_version="v1", feedbacks=feedbacks)

Run the App¶

In [ ]:

Copied!

with tru_rag as recording:
    rag1.query("What is Vint Cerf")
with tru_rag as recording:
    rag1.query("What is Vint Cerf")

In [ ]:

Copied!

session.get_leaderboard(app_ids=[tru_rag.app_id])
session.get_leaderboard(app_ids=[tru_rag.app_id])

Explore in a Dashboard¶

In [ ]:

Copied!

from trulens.dashboard import run_dashboard

run_dashboard(session)
from trulens.dashboard import run_dashboard

run_dashboard(session)