LiteLLM Quickstart¶

In this quickstart you will learn how to use LiteLLM as a feedback function provider.

LiteLLM is a consistent way to access 100+ LLMs such as those from OpenAI, HuggingFace, Anthropic, and Cohere. Using LiteLLM dramatically expands the model availability for feedback functions. Please be cautious in trusting the results of evaluations from models that have not yet been tested.

Specifically in this example we'll show how to use TogetherAI, but the LiteLLM provider can be used to run feedback functions using any LiteLLM supported model. We'll also use Mistral for the embedding and completion model also accessed via LiteLLM. The token usage and cost metrics for models used by LiteLLM will be also tracked by TruLens.

Note: LiteLLM costs are tracked for models included in this litellm community-maintained list.

In [ ]:

Copied!

# !pip install trulens trulens-providers-litellm chromadb mistralai
# !pip install trulens trulens-providers-litellm chromadb mistralai

In [ ]:

Copied!

import os

os.environ["TOGETHERAI_API_KEY"] = "..."
os.environ["MISTRAL_API_KEY"] = "..."
import os

os.environ["TOGETHERAI_API_KEY"] = "..."
os.environ["MISTRAL_API_KEY"] = "..."

Get Data¶

In this case, we'll just initialize some simple text in the notebook.

In [ ]:

Copied!





university_info = """
The University of Washington, founded in 1861 in Seattle, is a public research university
with over 45,000 students across three campuses in Seattle, Tacoma, and Bothell.
As the flagship institution of the six public universities in Washington state,
UW encompasses over 500 buildings and 20 million square feet of space,
including one of the largest library systems in the world.
"""
university_info = """
The University of Washington, founded in 1861 in Seattle, is a public research university
with over 45,000 students across three campuses in Seattle, Tacoma, and Bothell.
As the flagship institution of the six public universities in Washington state,
UW encompasses over 500 buildings and 20 million square feet of space,
including one of the largest library systems in the world.
"""

Create Vector Store¶

Create a chromadb vector store in memory.

In [ ]:

Copied!





import os

from litellm import embedding

embedding_response = embedding(
    model="mistral/mistral-embed",
    input=university_info,
)
import os

from litellm import embedding

embedding_response = embedding(
    model="mistral/mistral-embed",
    input=university_info,
)

In [ ]:

Copied!

embedding_response.data[0]["embedding"]
embedding_response.data[0]["embedding"]

In [ ]:

Copied!

import chromadb

chroma_client = chromadb.Client()
vector_store = chroma_client.get_or_create_collection(name="Universities")
import chromadb

chroma_client = chromadb.Client()
vector_store = chroma_client.get_or_create_collection(name="Universities")

Add the university_info to the embedding database.

In [ ]:

Copied!





vector_store.add(
    "uni_info",
    documents=university_info,
    embeddings=embedding_response.data[0]["embedding"],
)
vector_store.add(
    "uni_info",
    documents=university_info,
    embeddings=embedding_response.data[0]["embedding"],
)

Build RAG from scratch¶

Build a custom RAG from scratch, and add TruLens custom instrumentation.

In [ ]:

Copied!

from trulens.core import TruSession
from trulens.apps.custom import instrument

session = TruSession()
session.reset_database()
from trulens.core import TruSession
from trulens.apps.custom import instrument

session = TruSession()
session.reset_database()

In [ ]:

Copied!





import litellm


class RAG_from_scratch:
    @instrument
    def retrieve(self, query: str) -> list:
        """
        Retrieve relevant text from vector store.
        """
        results = vector_store.query(
            query_embeddings=embedding(
                model="mistral/mistral-embed", input=query
            ).data[0]["embedding"],
            n_results=2,
        )
        return results["documents"]

    @instrument
    def generate_completion(self, query: str, context_str: list) -> str:
        """
        Generate answer from context.
        """
        completion = (
            litellm.completion(
                model="mistral/mistral-small",
                temperature=0,
                messages=[
                    {
                        "role": "user",
                        "content": f"We have provided context information below. \n"
                        f"---------------------\n"
                        f"{context_str}"
                        f"\n---------------------\n"
                        f"Given this information, please answer the question: {query}",
                    }
                ],
            )
            .choices[0]
            .message.content
        )
        return completion

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve(query)
        completion = self.generate_completion(query, context_str)
        return completion


rag = RAG_from_scratch()
import litellm


class RAG_from_scratch:
    @instrument
    def retrieve(self, query: str) -> list:
        """
        Retrieve relevant text from vector store.
        """
        results = vector_store.query(
            query_embeddings=embedding(
                model="mistral/mistral-embed", input=query
            ).data[0]["embedding"],
            n_results=2,
        )
        return results["documents"]

    @instrument
    def generate_completion(self, query: str, context_str: list) -> str:
        """
        Generate answer from context.
        """
        completion = (
            litellm.completion(
                model="mistral/mistral-small",
                temperature=0,
                messages=[
                    {
                        "role": "user",
                        "content": f"We have provided context information below. \n"
                        f"---------------------\n"
                        f"{context_str}"
                        f"\n---------------------\n"
                        f"Given this information, please answer the question: {query}",
                    }
                ],
            )
            .choices[0]
            .message.content
        )
        return completion

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve(query)
        completion = self.generate_completion(query, context_str)
        return completion


rag = RAG_from_scratch()

Set up feedback functions.¶

Here we'll use groundedness, answer relevance and context relevance to detect hallucination.

In [ ]:

Copied!





import numpy as np
from trulens.core import Feedback
from trulens.core import Select
from trulens.providers.litellm import LiteLLM

# Initialize LiteLLM-based feedback function collection class:
provider = LiteLLM(model_engine="together_ai/togethercomputer/llama-2-70b-chat")

# Define a groundedness feedback function
f_groundedness = (
    Feedback(
        provider.groundedness_measure_with_cot_reasons, name="Groundedness"
    )
    .on(Select.RecordCalls.retrieve.rets.collect())
    .on_output()
)

# Question/answer relevance between overall question and answer.
f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on(Select.RecordCalls.retrieve.args.query)
    .on_output()
)

# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(
        provider.context_relevance_with_cot_reasons, name="Context Relevance"
    )
    .on(Select.RecordCalls.retrieve.args.query)
    .on(Select.RecordCalls.retrieve.rets.collect())
    .aggregate(np.mean)
)

f_coherence = Feedback(
    provider.coherence_with_cot_reasons, name="coherence"
).on_output()
import numpy as np
from trulens.core import Feedback
from trulens.core import Select
from trulens.providers.litellm import LiteLLM

# Initialize LiteLLM-based feedback function collection class:
provider = LiteLLM(model_engine="together_ai/togethercomputer/llama-2-70b-chat")

# Define a groundedness feedback function
f_groundedness = (
    Feedback(
        provider.groundedness_measure_with_cot_reasons, name="Groundedness"
    )
    .on(Select.RecordCalls.retrieve.rets.collect())
    .on_output()
)

# Question/answer relevance between overall question and answer.
f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on(Select.RecordCalls.retrieve.args.query)
    .on_output()
)

# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(
        provider.context_relevance_with_cot_reasons, name="Context Relevance"
    )
    .on(Select.RecordCalls.retrieve.args.query)
    .on(Select.RecordCalls.retrieve.rets.collect())
    .aggregate(np.mean)
)

f_coherence = Feedback(
    provider.coherence_with_cot_reasons, name="coherence"
).on_output()

In [ ]:

Copied!





provider.groundedness_measure_with_cot_reasons(
    """e University of Washington, founded in 1861 in Seattle, is a public '
  'research university\n'
  'with over 45,000 students across three campuses in Seattle, Tacoma, and '
  'Bothell.\n'
  'As the flagship institution of the six public universities in Washington 'githugithub
  'state,\n'
  'UW encompasses over 500 buildings and 20 million square feet of space,\n'
  'including one of the largest library systems in the world.\n']]""",
    "The University of Washington was founded in 1861. It is the flagship institution of the state of washington.",
)
provider.groundedness_measure_with_cot_reasons(
    """e University of Washington, founded in 1861 in Seattle, is a public '
  'research university\n'
  'with over 45,000 students across three campuses in Seattle, Tacoma, and '
  'Bothell.\n'
  'As the flagship institution of the six public universities in Washington 'githugithub
  'state,\n'
  'UW encompasses over 500 buildings and 20 million square feet of space,\n'
  'including one of the largest library systems in the world.\n']]""",
    "The University of Washington was founded in 1861. It is the flagship institution of the state of washington.",
)

Construct the app¶

Wrap the custom RAG with TruCustomApp, add list of feedbacks for eval

In [ ]:

Copied!





from trulens.apps.custom import TruCustomApp

tru_rag = TruCustomApp(
    rag,
    app_name="RAG",
    app_version="v1",
    feedbacks=[
        f_groundedness,
        f_answer_relevance,
        f_context_relevance,
        f_coherence,
    ],
)
from trulens.apps.custom import TruCustomApp

tru_rag = TruCustomApp(
    rag,
    app_name="RAG",
    app_version="v1",
    feedbacks=[
        f_groundedness,
        f_answer_relevance,
        f_context_relevance,
        f_coherence,
    ],
)

Run the app¶

Use tru_rag as a context manager for the custom RAG-from-scratch app.

In [ ]:

Copied!

with tru_rag as recording:
    rag.query("Give me a long history of U Dub")
with tru_rag as recording:
    rag.query("Give me a long history of U Dub")

In [ ]:

Copied!

session.get_leaderboard(app_ids=[tru_rag.app_id])
session.get_leaderboard(app_ids=[tru_rag.app_id])

In [ ]:

Copied!

from trulens.dashboard import run_dashboard

run_dashboard(session)
from trulens.dashboard import run_dashboard

run_dashboard(session)