Stock Feedback Functions¶

Classification-based¶

🤗 Huggingface¶

Out of the box feedback functions calling Huggingface APIs.

`context_relevance` ¶

Uses Huggingface's truera/context_relevance model, a model that uses computes the relevance of a given context to the prompt. The model can be found at https://huggingface.co/truera/context_relevance.

Example:

```python
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()

feedback = (
    Feedback(huggingface_provider.context_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

`groundedness_measure_with_nli` ¶

A measure to track if the source material supports each sentence in the statement using an NLI model.

First the response will be split into statements using a sentence tokenizer.The NLI model will process each statement using a natural language inference model, and will use the entire source.

Example:

```
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface

huggingface_provider = Huggingface()

f_groundedness = (
    Feedback(huggingface_provider.groundedness_measure_with_nli)
    .on(context)
    .on_output()
```

`hallucination_evaluator` ¶

Evaluates the hallucination score for a combined input of two statements as a float 0<x<1 representing a true/false boolean. if the return is greater than 0.5 the statement is evaluated as true. if the return is less than 0.5 the statement is evaluated as a hallucination.

Example:

```python
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()

score = huggingface_provider.hallucination_evaluator("The sky is blue. [SEP] Apples are red , the grass is green.")
```

`language_match` ¶

Uses Huggingface's papluca/xlm-roberta-base-language-detection model. A function that uses language detection on text1 and text2 and calculates the probit difference on the language detected on text1. The function is: 1.0 - (|probit_language_text1(text1) - probit_language_text1(text2))

Example:

```python
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.language_match).on_input_output()
```

The `on_input_output()` selector can be changed. See [Feedback Function
Guide](https://www.trulens.org/trulens/feedback_function_guide/)

`load` `staticmethod` ¶

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

`model_validate` `classmethod` ¶

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

`pii_detection` ¶

NER model to detect PII.

Example:

```python
hugs = Huggingface()

# Define a pii_detection feedback function using HuggingFace.
f_pii_detection = Feedback(hugs.pii_detection).on_input()
```

The `on(...)` selector can be changed. See [Feedback Function Guide:
Selectors](https://www.trulens.org/trulens/feedback_function_guide/#selector-details)

`pii_detection_with_cot_reasons` ¶

NER model to detect PII, with reasons.

Example:

```python
hugs = Huggingface()

# Define a pii_detection feedback function using HuggingFace.
f_pii_detection = Feedback(hugs.pii_detection).on_input()
```

The `on(...)` selector can be changed. See [Feedback Function Guide
:
Selectors](https://www.trulens.org/trulens/feedback_function_guide/#selector-details)

Args:
    text: A text prompt that may contain a name.

Returns:
    Tuple[float, str]: A tuple containing a the likelihood that a PII is contained in the input text and a string containing what PII is detected (if any).

`positive_sentiment` ¶

Uses Huggingface's cardiffnlp/twitter-roberta-base-sentiment model. A function that uses a sentiment classifier on text.

Example:

```python
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.positive_sentiment).on_output()
```

`toxic` ¶

Uses Huggingface's martin-ha/toxic-comment-model model. A function that uses a toxic comment classifier on text.

Example:

```python
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.toxic).on_output()
```

`tru_class_info: Class` `instance-attribute` ¶

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

OpenAI¶

API Reference: OpenAI.

Out of the box feedback functions calling OpenAI APIs. Additionally, all feedback functions listed in the base LLMProvider class can be run with OpenAI.

Create an OpenAI Provider with out of the box feedback functions.

Example:

```python
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()
```

`coherence` ¶

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.coherence).on_output()
```

`coherence_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.coherence_with_cot_reasons).on_output()
```

`comprehensiveness_with_cot_reasons` ¶

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example:

```python
feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
```

`conciseness` ¶

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```

`conciseness_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```

Args: text: The text to evaluate the conciseness of.

`context_relevance` ¶

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

Returns: float: A value between 0.0 (not relevant) and 1.0 (relevant).

`context_relevance_verb_confidence` ¶

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.llamaindex import TruLlama
context = TruLlama.select_context(llamaindex_rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

Returns: float: A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". Dict[str, float]: A dictionary containing the confidence score.

`context_relevance_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

`controversiality` ¶

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example:

```python
feedback = Feedback(provider.controversiality).on_output()
```

`controversiality_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.controversiality_with_cot_reasons).on_output()
```

`correctness` ¶

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.correctness).on_output()
```

`correctness_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.correctness_with_cot_reasons).on_output()
```

`criminality` ¶

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.criminality).on_output()
```

`criminality_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.criminality_with_cot_reasons).on_output()
```

`generate_confidence_score` ¶

Base method to generate a score normalized to 0 to 1, used for evaluation.

`generate_score` ¶

Base method to generate a score normalized to 0 to 1, used for evaluation.

`generate_score_and_reasons` ¶

Base method to generate a score and reason, used for evaluation.

`groundedness_measure_with_cot_reasons` ¶

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not dilute the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

Abstentions will be considered as grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    )
```

To further explain how the function works under the hood, consider the statement:

"Hi. I'm here to help. The university of Washington is a public research university. UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The function will split the statement into its component sentences:

"Hi."
"I'm here to help."
"The university of Washington is a public research university."
"UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

Next, trivial statements are removed, leaving only:

"The university of Washington is a public research university."
"UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The LLM will then process the statement, to assess the groundedness of the statement.

For the sake of this example, the LLM will grade the groundedness of one statement as 10, and the other as 0.

Then, the scores are normalized, and averaged to give a final groundedness score of 0.5.

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

`groundedness_measure_with_cot_reasons_consider_answerability` ¶

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not delete the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

In the case of abstentions, such as 'I do not know', the LLM will be asked to consider the answerability of the question given the source material.

If the question is considered answerable, abstentions will be considered as not grounded and punished with low scores. Otherwise, unanswerable abstentions will be considered grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    .on_input()
    )
```

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

`harmfulness` ¶

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.harmfulness).on_output()
```

`harmfulness_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
```

`helpfulness` ¶

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.helpfulness).on_output()
```

`helpfulness_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output()
```

`insensitivity` ¶

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.insensitivity).on_output()
```

`insensitivity_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()
```

`load` `staticmethod` ¶

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

`maliciousness` ¶

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.maliciousness).on_output()
```

`maliciousness_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output()
```

`misogyny` ¶

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.misogyny).on_output()
```

`misogyny_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.misogyny_with_cot_reasons).on_output()
```

`model_agreement` ¶

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example:

```python
feedback = Feedback(provider.model_agreement).on_input_output()
```

`model_validate` `classmethod` ¶

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

`moderation_harassment` ¶

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment, higher_is_better=False
).on_output()
```

`moderation_harassment_threatening` ¶

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment_threatening, higher_is_better=False
).on_output()
```

`moderation_hate` ¶

Uses OpenAI's Moderation API. A function that checks if text is hate speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hate, higher_is_better=False
).on_output()
```

`moderation_hatethreatening` ¶

Uses OpenAI's Moderation API. A function that checks if text is threatening speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hatethreatening, higher_is_better=False
).on_output()
```

`moderation_selfharm` ¶

Uses OpenAI's Moderation API. A function that checks if text is about self harm.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_selfharm, higher_is_better=False
).on_output()
```

`moderation_sexual` ¶

Uses OpenAI's Moderation API. A function that checks if text is sexual speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexual, higher_is_better=False
).on_output()
```

`moderation_sexualminors` ¶

Uses OpenAI's Moderation API. A function that checks if text is about sexual minors.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexualminors, higher_is_better=False
).on_output()
```

`moderation_violence` ¶

Uses OpenAI's Moderation API. A function that checks if text is about violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violence, higher_is_better=False
).on_output()
```

`moderation_violencegraphic` ¶

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violencegraphic, higher_is_better=False
).on_output()
```

`qs_relevance` ¶

Deprecated. Use relevance instead.

`qs_relevance_with_cot_reasons` ¶

Deprecated. Use relevance_with_cot_reasons instead.

`relevance` ¶

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example:

```python
feedback = Feedback(provider.relevance).on_input_output()
```

Usage on RAG Contexts

feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)

`relevance_with_cot_reasons` ¶

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = (
    Feedback(provider.relevance_with_cot_reasons)
    .on_input()
    .on_output()
```

`sentiment` ¶

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example:

```python
feedback = Feedback(provider.sentiment).on_output()
```

`sentiment_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.sentiment_with_cot_reasons).on_output()
```

`stereotypes` ¶

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes).on_input_output()
```

`stereotypes_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes_with_cot_reasons).on_input_output()
```

`summarization_with_cot_reasons` ¶

Summarization is deprecated in place of comprehensiveness. This function is no longer implemented.

`tru_class_info: Class` `instance-attribute` ¶

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

Generation-based: LLMProvider¶

API Reference: LLMProvider.

An LLM-based provider.

This is an abstract class and needs to be initialized as one of these:

OpenAI and subclass AzureOpenAI.
Bedrock.
LiteLLM. LiteLLM provides an interface to a wide range of models.
Langchain.

`coherence` ¶

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.coherence).on_output()
```

`coherence_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.coherence_with_cot_reasons).on_output()
```

`comprehensiveness_with_cot_reasons` ¶

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example:

```python
feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
```

`conciseness` ¶

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```

`conciseness_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```

Args: text: The text to evaluate the conciseness of.

`context_relevance` ¶

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

Returns: float: A value between 0.0 (not relevant) and 1.0 (relevant).

`context_relevance_verb_confidence` ¶

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.llamaindex import TruLlama
context = TruLlama.select_context(llamaindex_rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

Returns: float: A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". Dict[str, float]: A dictionary containing the confidence score.

`context_relevance_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```

`controversiality` ¶

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example:

```python
feedback = Feedback(provider.controversiality).on_output()
```

`controversiality_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.controversiality_with_cot_reasons).on_output()
```

`correctness` ¶

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.correctness).on_output()
```

`correctness_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.correctness_with_cot_reasons).on_output()
```

`criminality` ¶

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.criminality).on_output()
```

`criminality_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.criminality_with_cot_reasons).on_output()
```

`endpoint: Optional[mod_endpoint.Endpoint] = None` `class-attribute` `instance-attribute` ¶

Endpoint supporting this provider.

Remote API invocations are handled by the endpoint.

`generate_confidence_score` ¶

Base method to generate a score normalized to 0 to 1, used for evaluation.

`generate_score` ¶

Base method to generate a score normalized to 0 to 1, used for evaluation.

`generate_score_and_reasons` ¶

Base method to generate a score and reason, used for evaluation.

`groundedness_measure_with_cot_reasons` ¶

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not dilute the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

Abstentions will be considered as grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    )
```

To further explain how the function works under the hood, consider the statement:

"Hi. I'm here to help. The university of Washington is a public research university. UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The function will split the statement into its component sentences:

"Hi."
"I'm here to help."
"The university of Washington is a public research university."
"UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

Next, trivial statements are removed, leaving only:

"The university of Washington is a public research university."
"UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The LLM will then process the statement, to assess the groundedness of the statement.

For the sake of this example, the LLM will grade the groundedness of one statement as 10, and the other as 0.

Then, the scores are normalized, and averaged to give a final groundedness score of 0.5.

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

`groundedness_measure_with_cot_reasons_consider_answerability` ¶

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not delete the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

In the case of abstentions, such as 'I do not know', the LLM will be asked to consider the answerability of the question given the source material.

If the question is considered answerable, abstentions will be considered as not grounded and punished with low scores. Otherwise, unanswerable abstentions will be considered grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    .on_input()
    )
```

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

`harmfulness` ¶

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.harmfulness).on_output()
```

`harmfulness_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
```

`helpfulness` ¶

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.helpfulness).on_output()
```

`helpfulness_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output()
```

`insensitivity` ¶

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.insensitivity).on_output()
```

`insensitivity_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()
```

`load` `staticmethod` ¶

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

`maliciousness` ¶

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.maliciousness).on_output()
```

`maliciousness_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output()
```

`misogyny` ¶

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.misogyny).on_output()
```

`misogyny_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.misogyny_with_cot_reasons).on_output()
```

`model_agreement` ¶

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example:

```python
feedback = Feedback(provider.model_agreement).on_input_output()
```

`model_validate` `classmethod` ¶

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

`qs_relevance` ¶

Deprecated. Use relevance instead.

`qs_relevance_with_cot_reasons` ¶

Deprecated. Use relevance_with_cot_reasons instead.

`relevance` ¶

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example:

```python
feedback = Feedback(provider.relevance).on_input_output()
```

Usage on RAG Contexts

feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)

`relevance_with_cot_reasons` ¶

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = (
    Feedback(provider.relevance_with_cot_reasons)
    .on_input()
    .on_output()
```

`sentiment` ¶

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example:

```python
feedback = Feedback(provider.sentiment).on_output()
```

`sentiment_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.sentiment_with_cot_reasons).on_output()
```

`stereotypes` ¶

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes).on_input_output()
```

`stereotypes_with_cot_reasons` ¶

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes_with_cot_reasons).on_input_output()
```

`summarization_with_cot_reasons` ¶

Summarization is deprecated in place of comprehensiveness. This function is no longer implemented.

`tru_class_info: Class` `instance-attribute` ¶

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

Embedding-based¶

API Reference: Embeddings.

`Embeddings` ¶

Embedding related feedback function implementations.

`cosine_distance` ¶

Runs cosine distance on the query and document embeddings

Example:

Below is just one example. Embedders from llama-index are supported:
https://docs.llamaindex.ai/en/latest/module_guides/models/embeddings/


```python
from llama_index.embeddings.openai import OpenAIEmbedding
from trulens.feedback.embeddings import Embeddings

embed_model = OpenAIEmbedding()

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.cosine_distance)                .on_input_output()
```

`euclidean_distance` ¶

Runs L2 distance on the query and document embeddings

Example:

Below is just one example. Embedders from llama-index are supported:
https://docs.llamaindex.ai/en/latest/module_guides/models/embeddings/

```python
from llama_index.embeddings.openai import OpenAIEmbedding
from trulens.feedback.embeddings import Embeddings

embed_model = OpenAIEmbedding()

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.euclidean_distance)                .on_input_output()
```

`load` `staticmethod` ¶

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

`manhattan_distance` ¶

Runs L1 distance on the query and document embeddings

Example:

Below is just one example. Embedders from llama-index are supported:
https://docs.llamaindex.ai/en/latest/module_guides/models/embeddings/

```python
from llama_index.embeddings.openai import OpenAIEmbedding
from trulens.feedback.embeddings import Embeddings

embed_model = OpenAIEmbedding()

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.manhattan_distance)                .on_input_output()
```

`model_validate` `classmethod` ¶

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

`tru_class_info: Class` `instance-attribute` ¶

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

Combinations¶

Ground Truth Agreement¶

API Reference: GroundTruthAgreement

`GroundTruthAggregator` ¶

`auc` ¶

Calculate the area under the ROC curve. Can be used for meta-evaluation.

`brier_score` ¶

assess both calibration and sharpness of the probability estimates Args: scores (List[float]): relevance scores returned by feedback function Returns: float: Brier score

`ece` ¶

Calculate the expected calibration error. Can be used for meta-evaluation.

`ir_hit_rate` ¶

Calculate the IR hit rate at top k. the proportion of queries for which at least one relevant document is retrieved in the top k results. This metric evaluates whether a relevant document is present among the top k retrieved Parameters: scores (list or array): The list of scores generated by the model.

Returns: float: The hit rate at top k. Binary 0 or 1.

`kendall_tau` ¶

Calculate Kendall's tau. Can be used for meta-evaluation. Kendall’s tau is a measure of the correspondence between two rankings. Values close to 1 indicate strong agreement, values close to -1 indicate strong disagreement. This is the tau-b version of Kendall’s tau which accounts for ties.

`load` `staticmethod` ¶

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

`mae` ¶

Calculate the mean absolute error. Can be used for meta-evaluation.

`model_config: dict = dict(arbitrary_types_allowed=True, extra='allow')` `class-attribute` ¶

Aggregate benchmarking metrics for ground-truth-based evaluation on feedback functions.

`model_validate` `classmethod` ¶

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

`mrr` ¶

Calculate the mean reciprocal rank. Can be used for meta-evaluation.

`ndcg_at_k` ¶

NDCG can be used for meta-evaluation of other feedback results, returned as relevance scores.

`precision_at_k` ¶

Calculate the precision at K. Can be used for meta-evaluation.

`recall_at_k` ¶

Calculate the recall at K. Can be used for meta-evaluation.

`register_custom_agg_func` ¶

Register a custom aggregation function.

`spearman_correlation` ¶

Calculate the Spearman correlation. Can be used for meta-evaluation. The Spearman correlation coefficient is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables).

`tru_class_info: Class` `instance-attribute` ¶

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

`GroundTruthAgreement` ¶

Measures Agreement against a Ground Truth.

`absolute_error` ¶

Method to look up the numeric expected score from a golden set and take the difference.

Primarily used for evaluation of model generated feedback against human feedback

Example:

```python
from trulens.core import Feedback
from trulens.feedback import GroundTruthAgreement
from trulens.providers.bedrock import Bedrock

golden_set =
{"query": "How many stomachs does a cow have?", "expected_response": "Cows' diet relies primarily on grazing.", "expected_score": 0.4},
{"query": "Name some top dental floss brands", "expected_response": "I don't know", "expected_score": 0.8}
]

bedrock = Bedrock(
    model_id="amazon.titan-text-express-v1", region_name="us-east-1"
)
ground_truth_collection = GroundTruthAgreement(golden_set, provider=bedrock)

f_groundtruth = Feedback(ground_truth.absolute_error.on(Select.Record.calls[0].args.args[0]).on(Select.Record.calls[0].args.args[1]).on_output()
```

`agreement_measure` ¶

Uses OpenAI's Chat GPT Model. A function that that measures similarity to ground truth. A second template is given to Chat GPT with a prompt that the original response is correct, and measures whether previous Chat GPT's response is similar.

Example:

```python
from trulens.core import Feedback
from trulens.feedback import GroundTruthAgreement
from trulens.providers.openai import OpenAI

golden_set = [
    {"query": "who invented the lightbulb?", "expected_response": "Thomas Edison"},
    {"query": "¿quien invento la bombilla?", "expected_response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set, provider=OpenAI())

feedback = Feedback(ground_truth_collection.agreement_measure).on_input_output()
```
The `on_input_output()` selector can be changed. See [Feedback Function Guide](https://www.trulens.org/trulens/feedback_function_guide/)

`bert_score` ¶

Uses BERT Score. A function that that measures similarity to ground truth using bert embeddings.

Example:

```python
from trulens.core import Feedback
from trulens.feedback import GroundTruthAgreement
from trulens.providers.openai import OpenAI
golden_set = [
    {"query": "who invented the lightbulb?", "expected_response": "Thomas Edison"},
    {"query": "¿quien invento la bombilla?", "expected_response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set, provider=OpenAI())

feedback = Feedback(ground_truth_collection.bert_score).on_input_output()
```
The `on_input_output()` selector can be changed. See [Feedback Function Guide](https://www.trulens.org/trulens/feedback_function_guide/)

`bleu` ¶

Uses BLEU Score. A function that that measures similarity to ground truth using token overlap.

Example:

```python
from trulens.core import Feedback
from trulens.feedback import GroundTruthAgreement
from trulens.providers.openai import OpenAI
golden_set = [
    {"query": "who invented the lightbulb?", "expected_response": "Thomas Edison"},
    {"query": "¿quien invento la bombilla?", "expected_response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set, provider=OpenAI())

feedback = Feedback(ground_truth_collection.bleu).on_input_output()
```
The `on_input_output()` selector can be changed. See [Feedback Function Guide](https://www.trulens.org/trulens/feedback_function_guide/)

`load` `staticmethod` ¶

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

`model_validate` `classmethod` ¶

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

`rouge` ¶

Uses BLEU Score. A function that that measures similarity to ground truth using token overlap.

`tru_class_info: Class` `instance-attribute` ¶

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

Stock Feedback Functions¶

Classification-based¶

🤗 Huggingface¶

context_relevance ¶

groundedness_measure_with_nli ¶

hallucination_evaluator ¶

language_match ¶

load staticmethod ¶

model_validate classmethod ¶

pii_detection ¶

pii_detection_with_cot_reasons ¶

positive_sentiment ¶

toxic ¶

tru_class_info: Class instance-attribute ¶

OpenAI¶

coherence ¶

coherence_with_cot_reasons ¶

comprehensiveness_with_cot_reasons ¶

conciseness ¶

conciseness_with_cot_reasons ¶

context_relevance ¶

context_relevance_verb_confidence ¶

context_relevance_with_cot_reasons ¶

controversiality ¶

controversiality_with_cot_reasons ¶

correctness ¶

correctness_with_cot_reasons ¶

criminality ¶

criminality_with_cot_reasons ¶

generate_confidence_score ¶

generate_score ¶

generate_score_and_reasons ¶

groundedness_measure_with_cot_reasons ¶

groundedness_measure_with_cot_reasons_consider_answerability ¶

harmfulness ¶

harmfulness_with_cot_reasons ¶

helpfulness ¶

helpfulness_with_cot_reasons ¶

insensitivity ¶

insensitivity_with_cot_reasons ¶

load staticmethod ¶

maliciousness ¶

maliciousness_with_cot_reasons ¶

misogyny ¶

misogyny_with_cot_reasons ¶

model_agreement ¶

model_validate classmethod ¶

moderation_harassment ¶

moderation_harassment_threatening ¶

moderation_hate ¶

moderation_hatethreatening ¶

moderation_selfharm ¶

moderation_sexual ¶

moderation_sexualminors ¶

moderation_violence ¶

moderation_violencegraphic ¶

qs_relevance ¶

qs_relevance_with_cot_reasons ¶

relevance ¶

relevance_with_cot_reasons ¶

sentiment ¶

sentiment_with_cot_reasons ¶

stereotypes ¶

stereotypes_with_cot_reasons ¶

summarization_with_cot_reasons ¶

tru_class_info: Class instance-attribute ¶

Generation-based: LLMProvider¶

coherence ¶

coherence_with_cot_reasons ¶

comprehensiveness_with_cot_reasons ¶

conciseness ¶

conciseness_with_cot_reasons ¶

context_relevance ¶

context_relevance_verb_confidence ¶

context_relevance_with_cot_reasons ¶

controversiality ¶

controversiality_with_cot_reasons ¶

correctness ¶

correctness_with_cot_reasons ¶

criminality ¶

`context_relevance` ¶

`groundedness_measure_with_nli` ¶

`hallucination_evaluator` ¶

`language_match` ¶

`load` `staticmethod` ¶

`model_validate` `classmethod` ¶

`pii_detection` ¶

`pii_detection_with_cot_reasons` ¶

`positive_sentiment` ¶

`toxic` ¶

`tru_class_info: Class` `instance-attribute` ¶

`coherence` ¶

`coherence_with_cot_reasons` ¶

`comprehensiveness_with_cot_reasons` ¶

`conciseness` ¶

`conciseness_with_cot_reasons` ¶

`context_relevance` ¶

`context_relevance_verb_confidence` ¶

`context_relevance_with_cot_reasons` ¶

`controversiality` ¶

`controversiality_with_cot_reasons` ¶

`correctness` ¶

`correctness_with_cot_reasons` ¶

`criminality` ¶

`criminality_with_cot_reasons` ¶

`generate_confidence_score` ¶

`generate_score` ¶

`generate_score_and_reasons` ¶

`groundedness_measure_with_cot_reasons` ¶

`groundedness_measure_with_cot_reasons_consider_answerability` ¶

`harmfulness` ¶

`harmfulness_with_cot_reasons` ¶

`helpfulness` ¶

`helpfulness_with_cot_reasons` ¶

`insensitivity` ¶

`insensitivity_with_cot_reasons` ¶

`load` `staticmethod` ¶

`maliciousness` ¶

`maliciousness_with_cot_reasons` ¶

`misogyny` ¶

`misogyny_with_cot_reasons` ¶

`model_agreement` ¶

`model_validate` `classmethod` ¶

`moderation_harassment` ¶

`moderation_harassment_threatening` ¶

`moderation_hate` ¶

`moderation_hatethreatening` ¶

`moderation_selfharm` ¶

`moderation_sexual` ¶

`moderation_sexualminors` ¶

`moderation_violence` ¶

`moderation_violencegraphic` ¶

`qs_relevance` ¶

`qs_relevance_with_cot_reasons` ¶

`relevance` ¶

`relevance_with_cot_reasons` ¶

`sentiment` ¶

`sentiment_with_cot_reasons` ¶

`stereotypes` ¶

`stereotypes_with_cot_reasons` ¶

`summarization_with_cot_reasons` ¶

`tru_class_info: Class` `instance-attribute` ¶

`coherence` ¶

`coherence_with_cot_reasons` ¶

`comprehensiveness_with_cot_reasons` ¶

`conciseness` ¶

`conciseness_with_cot_reasons` ¶

`context_relevance` ¶

`context_relevance_verb_confidence` ¶

`context_relevance_with_cot_reasons` ¶

`controversiality` ¶

`controversiality_with_cot_reasons` ¶

`correctness` ¶

`correctness_with_cot_reasons` ¶

`criminality` ¶

`criminality_with_cot_reasons` ¶

`endpoint: Optional[mod_endpoint.Endpoint] = None` `class-attribute` `instance-attribute` ¶

`generate_confidence_score` ¶

`generate_score` ¶

`generate_score_and_reasons` ¶