Skip to content

Stock Feedback Functions

Classification-based

๐Ÿค— Huggingface

API Reference: Huggingface.

Out of the box feedback functions calling Huggingface APIs.

context_relevance

Uses Huggingface's truera/context_relevance model, a model that uses computes the relevance of a given context to the prompt. The model can be found at https://huggingface.co/truera/context_relevance.

Example:

```python
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()

feedback = (
    Feedback(huggingface_provider.context_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

groundedness_measure_with_nli

A measure to track if the source material supports each sentence in the statement using an NLI model.

First the response will be split into statements using a sentence tokenizer.The NLI model will process each statement using a natural language inference model, and will use the entire source.

Example:

```
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface

huggingface_provider = Huggingface()

f_groundedness = (
    Feedback(huggingface_provider.groundedness_measure_with_nli)
    .on(context)
    .on_output()
```

hallucination_evaluator

Evaluates the hallucination score for a combined input of two statements as a float 0<x<1 representing a true/false boolean. if the return is greater than 0.5 the statement is evaluated as true. if the return is less than 0.5 the statement is evaluated as a hallucination.

Example:

```python
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()

score = huggingface_provider.hallucination_evaluator("The sky is blue. [SEP] Apples are red , the grass is green.")
```

language_match

Uses Huggingface's papluca/xlm-roberta-base-language-detection model. A function that uses language detection on text1 and text2 and calculates the probit difference on the language detected on text1. The function is: 1.0 - (|probit_language_text1(text1) - probit_language_text1(text2))

Example:

```python
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.language_match).on_input_output()
```

The `on_input_output()` selector can be changed. See [Feedback Function
Guide](https://www.trulens.org/trulens/feedback_function_guide/)

load staticmethod

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

pii_detection

NER model to detect PII.

Example:

```python
hugs = Huggingface()

# Define a pii_detection feedback function using HuggingFace.
f_pii_detection = Feedback(hugs.pii_detection).on_input()
```

The `on(...)` selector can be changed. See [Feedback Function Guide:
Selectors](https://www.trulens.org/trulens/feedback_function_guide/#selector-details)

pii_detection_with_cot_reasons

NER model to detect PII, with reasons.

Example:

```python
hugs = Huggingface()

# Define a pii_detection feedback function using HuggingFace.
f_pii_detection = Feedback(hugs.pii_detection).on_input()
```

The `on(...)` selector can be changed. See [Feedback Function Guide
:
Selectors](https://www.trulens.org/trulens/feedback_function_guide/#selector-details)

Args:
    text: A text prompt that may contain a name.

Returns:
    Tuple[float, str]: A tuple containing a the likelihood that a PII is contained in the input text and a string containing what PII is detected (if any).

positive_sentiment

Uses Huggingface's cardiffnlp/twitter-roberta-base-sentiment model. A function that uses a sentiment classifier on text.

Example:

```python
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.positive_sentiment).on_output()
```

toxic

Uses Huggingface's martin-ha/toxic-comment-model model. A function that uses a toxic comment classifier on text.

Example:

```python
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.toxic).on_output()
```

tru_class_info: Class instance-attribute

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

OpenAI

API Reference: OpenAI.

Out of the box feedback functions calling OpenAI APIs. Additionally, all feedback functions listed in the base LLMProvider class can be run with OpenAI.

Create an OpenAI Provider with out of the box feedback functions.

Example:

```python
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()
```

coherence

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.coherence).on_output()
```

coherence_with_cot_reasons

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.coherence_with_cot_reasons).on_output()
```

comprehensiveness_with_cot_reasons

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example:

```python
feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
```

conciseness

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```

conciseness_with_cot_reasons

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```

Args: text: The text to evaluate the conciseness of.

context_relevance

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

Returns: float: A value between 0.0 (not relevant) and 1.0 (relevant).

context_relevance_verb_confidence

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.llamaindex import TruLlama
context = TruLlama.select_context(llamaindex_rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

Returns: float: A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". Dict[str, float]: A dictionary containing the confidence score.

context_relevance_with_cot_reasons

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

controversiality

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example:

```python
feedback = Feedback(provider.controversiality).on_output()
```

controversiality_with_cot_reasons

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.controversiality_with_cot_reasons).on_output()
```

correctness

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.correctness).on_output()
```

correctness_with_cot_reasons

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.correctness_with_cot_reasons).on_output()
```

criminality

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.criminality).on_output()
```

criminality_with_cot_reasons

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.criminality_with_cot_reasons).on_output()
```

generate_confidence_score

Base method to generate a score normalized to 0 to 1, used for evaluation.

generate_score

Base method to generate a score normalized to 0 to 1, used for evaluation.

generate_score_and_reasons

Base method to generate a score and reason, used for evaluation.

groundedness_measure_with_cot_reasons

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not dilute the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

Abstentions will be considered as grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    )
```

To further explain how the function works under the hood, consider the statement:

"Hi. I'm here to help. The university of Washington is a public research university. UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The function will split the statement into its component sentences:

  1. "Hi."
  2. "I'm here to help."
  3. "The university of Washington is a public research university."
  4. "UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

Next, trivial statements are removed, leaving only:

  1. "The university of Washington is a public research university."
  2. "UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The LLM will then process the statement, to assess the groundedness of the statement.

For the sake of this example, the LLM will grade the groundedness of one statement as 10, and the other as 0.

Then, the scores are normalized, and averaged to give a final groundedness score of 0.5.

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

groundedness_measure_with_cot_reasons_consider_answerability

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not delete the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

In the case of abstentions, such as 'I do not know', the LLM will be asked to consider the answerability of the question given the source material.

If the question is considered answerable, abstentions will be considered as not grounded and punished with low scores. Otherwise, unanswerable abstentions will be considered grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    .on_input()
    )
```

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

harmfulness

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.harmfulness).on_output()
```

harmfulness_with_cot_reasons

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
```

helpfulness

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.helpfulness).on_output()
```

helpfulness_with_cot_reasons

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output()
```

insensitivity

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.insensitivity).on_output()
```

insensitivity_with_cot_reasons

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()
```

load staticmethod

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

maliciousness

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.maliciousness).on_output()
```

maliciousness_with_cot_reasons

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output()
```

misogyny

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.misogyny).on_output()
```

misogyny_with_cot_reasons

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.misogyny_with_cot_reasons).on_output()
```

model_agreement

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example:

```python
feedback = Feedback(provider.model_agreement).on_input_output()
```

model_validate classmethod

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

moderation_harassment

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment, higher_is_better=False
).on_output()
```

moderation_harassment_threatening

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment_threatening, higher_is_better=False
).on_output()
```

moderation_hate

Uses OpenAI's Moderation API. A function that checks if text is hate speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hate, higher_is_better=False
).on_output()
```

moderation_hatethreatening

Uses OpenAI's Moderation API. A function that checks if text is threatening speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hatethreatening, higher_is_better=False
).on_output()
```

moderation_selfharm

Uses OpenAI's Moderation API. A function that checks if text is about self harm.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_selfharm, higher_is_better=False
).on_output()
```

moderation_sexual

Uses OpenAI's Moderation API. A function that checks if text is sexual speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexual, higher_is_better=False
).on_output()
```

moderation_sexualminors

Uses OpenAI's Moderation API. A function that checks if text is about sexual minors.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexualminors, higher_is_better=False
).on_output()
```

moderation_violence

Uses OpenAI's Moderation API. A function that checks if text is about violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violence, higher_is_better=False
).on_output()
```

moderation_violencegraphic

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violencegraphic, higher_is_better=False
).on_output()
```

qs_relevance

Deprecated. Use relevance instead.

qs_relevance_with_cot_reasons

Deprecated. Use relevance_with_cot_reasons instead.

relevance

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example:

```python
feedback = Feedback(provider.relevance).on_input_output()
```
Usage on RAG Contexts
feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)

relevance_with_cot_reasons

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = (
    Feedback(provider.relevance_with_cot_reasons)
    .on_input()
    .on_output()
```

sentiment

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example:

```python
feedback = Feedback(provider.sentiment).on_output()
```

sentiment_with_cot_reasons

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.sentiment_with_cot_reasons).on_output()
```

stereotypes

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes).on_input_output()
```

stereotypes_with_cot_reasons

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes_with_cot_reasons).on_input_output()
```

summarization_with_cot_reasons

Summarization is deprecated in place of comprehensiveness. This function is no longer implemented.

tru_class_info: Class instance-attribute

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

Generation-based: LLMProvider

API Reference: LLMProvider.

An LLM-based provider.

This is an abstract class and needs to be initialized as one of these:

coherence

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.coherence).on_output()
```

coherence_with_cot_reasons

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.coherence_with_cot_reasons).on_output()
```

comprehensiveness_with_cot_reasons

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example:

```python
feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
```

conciseness

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```

conciseness_with_cot_reasons

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```

Args: text: The text to evaluate the conciseness of.

context_relevance

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

Returns: float: A value between 0.0 (not relevant) and 1.0 (relevant).

context_relevance_verb_confidence

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.llamaindex import TruLlama
context = TruLlama.select_context(llamaindex_rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

Returns: float: A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". Dict[str, float]: A dictionary containing the confidence score.

context_relevance_with_cot_reasons

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

controversiality

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example:

```python
feedback = Feedback(provider.controversiality).on_output()
```

controversiality_with_cot_reasons

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.controversiality_with_cot_reasons).on_output()
```

correctness

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.correctness).on_output()
```

correctness_with_cot_reasons

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.correctness_with_cot_reasons).on_output()
```

criminality

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.criminality).on_output()
```

criminality_with_cot_reasons

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.criminality_with_cot_reasons).on_output()
```

endpoint: Optional[mod_endpoint.Endpoint] = None class-attribute instance-attribute

Endpoint supporting this provider.

Remote API invocations are handled by the endpoint.

generate_confidence_score

Base method to generate a score normalized to 0 to 1, used for evaluation.

generate_score

Base method to generate a score normalized to 0 to 1, used for evaluation.

generate_score_and_reasons

Base method to generate a score and reason, used for evaluation.

groundedness_measure_with_cot_reasons

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not dilute the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

Abstentions will be considered as grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    )
```

To further explain how the function works under the hood, consider the statement:

"Hi. I'm here to help. The university of Washington is a public research university. UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The function will split the statement into its component sentences:

  1. "Hi."
  2. "I'm here to help."
  3. "The university of Washington is a public research university."
  4. "UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

Next, trivial statements are removed, leaving only:

  1. "The university of Washington is a public research university."
  2. "UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The LLM will then process the statement, to assess the groundedness of the statement.

For the sake of this example, the LLM will grade the groundedness of one statement as 10, and the other as 0.

Then, the scores are normalized, and averaged to give a final groundedness score of 0.5.

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

groundedness_measure_with_cot_reasons_consider_answerability

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not delete the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

In the case of abstentions, such as 'I do not know', the LLM will be asked to consider the answerability of the question given the source material.

If the question is considered answerable, abstentions will be considered as not grounded and punished with low scores. Otherwise, unanswerable abstentions will be considered grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    .on_input()
    )
```

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

harmfulness

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.harmfulness).on_output()
```

harmfulness_with_cot_reasons

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
```

helpfulness

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.helpfulness).on_output()
```

helpfulness_with_cot_reasons

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output()
```

insensitivity

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.insensitivity).on_output()
```

insensitivity_with_cot_reasons

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()
```

load staticmethod

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

maliciousness

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.maliciousness).on_output()
```

maliciousness_with_cot_reasons

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output()
```

misogyny

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.misogyny).on_output()
```

misogyny_with_cot_reasons

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.misogyny_with_cot_reasons).on_output()
```

model_agreement

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example:

```python
feedback = Feedback(provider.model_agreement).on_input_output()
```

model_validate classmethod

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

qs_relevance

Deprecated. Use relevance instead.

qs_relevance_with_cot_reasons

Deprecated. Use relevance_with_cot_reasons instead.

relevance

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example:

```python
feedback = Feedback(provider.relevance).on_input_output()
```
Usage on RAG Contexts
feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)

relevance_with_cot_reasons

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = (
    Feedback(provider.relevance_with_cot_reasons)
    .on_input()
    .on_output()
```

sentiment

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example:

```python
feedback = Feedback(provider.sentiment).on_output()
```

sentiment_with_cot_reasons

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.sentiment_with_cot_reasons).on_output()
```

stereotypes

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes).on_input_output()
```

stereotypes_with_cot_reasons

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes_with_cot_reasons).on_input_output()
```

summarization_with_cot_reasons

Summarization is deprecated in place of comprehensiveness. This function is no longer implemented.

tru_class_info: Class instance-attribute

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

Embedding-based

API Reference: Embeddings.

Embeddings

Embedding related feedback function implementations.

cosine_distance

Runs cosine distance on the query and document embeddings

Example:

Below is just one example. Embedders from llama-index are supported:
https://docs.llamaindex.ai/en/latest/module_guides/models/embeddings/


```python
from llama_index.embeddings.openai import OpenAIEmbedding
from trulens.feedback.embeddings import Embeddings

embed_model = OpenAIEmbedding()

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.cosine_distance)                .on_input_output()
```
euclidean_distance

Runs L2 distance on the query and document embeddings

Example:

Below is just one example. Embedders from llama-index are supported:
https://docs.llamaindex.ai/en/latest/module_guides/models/embeddings/

```python
from llama_index.embeddings.openai import OpenAIEmbedding
from trulens.feedback.embeddings import Embeddings

embed_model = OpenAIEmbedding()

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.euclidean_distance)                .on_input_output()
```
load staticmethod

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

manhattan_distance

Runs L1 distance on the query and document embeddings

Example:

Below is just one example. Embedders from llama-index are supported:
https://docs.llamaindex.ai/en/latest/module_guides/models/embeddings/

```python
from llama_index.embeddings.openai import OpenAIEmbedding
from trulens.feedback.embeddings import Embeddings

embed_model = OpenAIEmbedding()

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.manhattan_distance)                .on_input_output()
```
model_validate classmethod

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

tru_class_info: Class instance-attribute

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

Combinations

Ground Truth Agreement

API Reference: GroundTruthAgreement

GroundTruthAggregator

auc

Calculate the area under the ROC curve. Can be used for meta-evaluation.

brier_score

assess both calibration and sharpness of the probability estimates Args: scores (List[float]): relevance scores returned by feedback function Returns: float: Brier score

ece

Calculate the expected calibration error. Can be used for meta-evaluation.

ir_hit_rate

Calculate the IR hit rate at top k. the proportion of queries for which at least one relevant document is retrieved in the top k results. This metric evaluates whether a relevant document is present among the top k retrieved Parameters: scores (list or array): The list of scores generated by the model.

Returns: float: The hit rate at top k. Binary 0 or 1.

kendall_tau

Calculate Kendall's tau. Can be used for meta-evaluation. Kendallโ€™s tau is a measure of the correspondence between two rankings. Values close to 1 indicate strong agreement, values close to -1 indicate strong disagreement. This is the tau-b version of Kendallโ€™s tau which accounts for ties.

load staticmethod

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

mae

Calculate the mean absolute error. Can be used for meta-evaluation.

model_config: dict = dict(arbitrary_types_allowed=True, extra='allow') class-attribute

Aggregate benchmarking metrics for ground-truth-based evaluation on feedback functions.

model_validate classmethod

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

mrr

Calculate the mean reciprocal rank. Can be used for meta-evaluation.

ndcg_at_k

NDCG can be used for meta-evaluation of other feedback results, returned as relevance scores.

precision_at_k

Calculate the precision at K. Can be used for meta-evaluation.

recall_at_k

Calculate the recall at K. Can be used for meta-evaluation.

register_custom_agg_func

Register a custom aggregation function.

spearman_correlation

Calculate the Spearman correlation. Can be used for meta-evaluation. The Spearman correlation coefficient is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables).

tru_class_info: Class instance-attribute

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

GroundTruthAgreement

Measures Agreement against a Ground Truth.

absolute_error

Method to look up the numeric expected score from a golden set and take the difference.

Primarily used for evaluation of model generated feedback against human feedback

Example:

```python
from trulens.core import Feedback
from trulens.feedback import GroundTruthAgreement
from trulens.providers.bedrock import Bedrock

golden_set =
{"query": "How many stomachs does a cow have?", "expected_response": "Cows' diet relies primarily on grazing.", "expected_score": 0.4},
{"query": "Name some top dental floss brands", "expected_response": "I don't know", "expected_score": 0.8}
]

bedrock = Bedrock(
    model_id="amazon.titan-text-express-v1", region_name="us-east-1"
)
ground_truth_collection = GroundTruthAgreement(golden_set, provider=bedrock)

f_groundtruth = Feedback(ground_truth.absolute_error.on(Select.Record.calls[0].args.args[0]).on(Select.Record.calls[0].args.args[1]).on_output()
```
agreement_measure

Uses OpenAI's Chat GPT Model. A function that that measures similarity to ground truth. A second template is given to Chat GPT with a prompt that the original response is correct, and measures whether previous Chat GPT's response is similar.

Example:

```python
from trulens.core import Feedback
from trulens.feedback import GroundTruthAgreement
from trulens.providers.openai import OpenAI

golden_set = [
    {"query": "who invented the lightbulb?", "expected_response": "Thomas Edison"},
    {"query": "ยฟquien invento la bombilla?", "expected_response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set, provider=OpenAI())

feedback = Feedback(ground_truth_collection.agreement_measure).on_input_output()
```
The `on_input_output()` selector can be changed. See [Feedback Function Guide](https://www.trulens.org/trulens/feedback_function_guide/)
bert_score

Uses BERT Score. A function that that measures similarity to ground truth using bert embeddings.

Example:

```python
from trulens.core import Feedback
from trulens.feedback import GroundTruthAgreement
from trulens.providers.openai import OpenAI
golden_set = [
    {"query": "who invented the lightbulb?", "expected_response": "Thomas Edison"},
    {"query": "ยฟquien invento la bombilla?", "expected_response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set, provider=OpenAI())

feedback = Feedback(ground_truth_collection.bert_score).on_input_output()
```
The `on_input_output()` selector can be changed. See [Feedback Function Guide](https://www.trulens.org/trulens/feedback_function_guide/)
bleu

Uses BLEU Score. A function that that measures similarity to ground truth using token overlap.

Example:

```python
from trulens.core import Feedback
from trulens.feedback import GroundTruthAgreement
from trulens.providers.openai import OpenAI
golden_set = [
    {"query": "who invented the lightbulb?", "expected_response": "Thomas Edison"},
    {"query": "ยฟquien invento la bombilla?", "expected_response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set, provider=OpenAI())

feedback = Feedback(ground_truth_collection.bleu).on_input_output()
```
The `on_input_output()` selector can be changed. See [Feedback Function Guide](https://www.trulens.org/trulens/feedback_function_guide/)
load staticmethod

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

rouge

Uses BLEU Score. A function that that measures similarity to ground truth using token overlap.

tru_class_info: Class instance-attribute

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.