Skip to content

trulens.providers.openai.provider

trulens.providers.openai.provider

Classes

OpenAI

Bases: LLMProvider

Out of the box feedback functions calling OpenAI APIs. Additionally, all feedback functions listed in the base LLMProvider class can be run with OpenAI.

Create an OpenAI Provider with out of the box feedback functions.

Example:

```python
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()
```
PARAMETER DESCRIPTION
model_engine

The OpenAI completion model. Defaults to gpt-4o-mini

TYPE: Optional[str] DEFAULT: None

**kwargs

Additional arguments to pass to the OpenAIEndpoint which are then passed to OpenAIClient and finally to the OpenAI client.

TYPE: dict DEFAULT: {}

Attributes
tru_class_info instance-attribute
tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

Functions
__rich_repr__
__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load staticmethod
load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod
model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

generate_score
generate_score(
    system_prompt: str,
    user_prompt: Optional[str] = None,
    min_score_val: int = 0,
    max_score_val: int = 10,
    temperature: float = 0.0,
) -> float

Base method to generate a score normalized to 0 to 1, used for evaluation.

PARAMETER DESCRIPTION
system_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt.

TYPE: Optional[str] DEFAULT: None

min_score_val

The minimum score value.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value.

TYPE: int DEFAULT: 10

temperature

The temperature for the LLM response.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

The score on a 0-1 scale.

generate_confidence_score
generate_confidence_score(
    verb_confidence_prompt: str,
    user_prompt: Optional[str] = None,
    min_score_val: int = 0,
    max_score_val: int = 10,
    temperature: float = 0.0,
) -> Tuple[float, Dict[str, float]]

Base method to generate a score normalized to 0 to 1, used for evaluation.

PARAMETER DESCRIPTION
verb_confidence_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt.

TYPE: Optional[str] DEFAULT: None

min_score_val

The minimum score value.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value.

TYPE: int DEFAULT: 10

temperature

The temperature for the LLM response.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
Tuple[float, Dict[str, float]]

The feedback score on a 0-1 scale and the confidence score.

generate_score_and_reasons
generate_score_and_reasons(
    system_prompt: str,
    user_prompt: Optional[str] = None,
    min_score_val: int = 0,
    max_score_val: int = 10,
    temperature: float = 0.0,
) -> Tuple[float, Dict]

Base method to generate a score and reason, used for evaluation.

PARAMETER DESCRIPTION
system_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt. Defaults to None.

TYPE: Optional[str] DEFAULT: None

min_score_val

The minimum score value.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value.

TYPE: int DEFAULT: 10

temperature

The temperature for the LLM response.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

The score on a 0-1 scale.

Dict

Reason metadata if returned by the LLM.

context_relevance
context_relevance(
    question: str,
    context: str,
    criteria: str = "",
    min_score_val: int = 0,
    max_score_val: int = 3,
    temperature: float = 0.0,
) -> float

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```
PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

criteria

Overriding evaluation criteria for evaluation .

TYPE: str DEFAULT: ''

min_score_val

The minimum score value. Defaults to 0.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value. Defaults to 3.

TYPE: int DEFAULT: 3

temperature

The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

Returns: float: A value between 0.0 (not relevant) and 1.0 (relevant).

context_relevance_with_cot_reasons
context_relevance_with_cot_reasons(
    question: str,
    context: str,
    criteria: str = "",
    min_score_val: int = 0,
    max_score_val: int = 3,
    temperature: float = 0.0,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```
PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

criteria

Overriding evaluation criteria for evaluation .

TYPE: str DEFAULT: ''

min_score_val

The minimum score value. Defaults to 0.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value. Defaults to 3.

TYPE: int DEFAULT: 3

temperature

The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: Tuple[float, Dict]

context_relevance_verb_confidence
context_relevance_verb_confidence(
    question: str,
    context: str,
    criteria: str = "",
    min_score_val: int = 0,
    max_score_val: int = 3,
    temperature: float = 0.0,
) -> Tuple[float, Dict[str, float]]

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.llamaindex import TruLlama
context = TruLlama.select_context(llamaindex_rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```
PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

criteria

Overriding evaluation criteria for evaluation .

TYPE: str DEFAULT: ''

min_score_val

The minimum score value. Defaults to 0.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value. Defaults to 3.

TYPE: int DEFAULT: 3

temperature

The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

Returns: float: A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". Dict[str, float]: A dictionary containing the confidence score.

relevance
relevance(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example:

```python
feedback = Feedback(provider.relevance).on_input_output()
```
Usage on RAG Contexts
feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: float

relevance_with_cot_reasons
relevance_with_cot_reasons(
    prompt: str, response: str
) -> Tuple[float, Dict]

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = (
    Feedback(provider.relevance_with_cot_reasons)
    .on_input()
    .on_output()
```
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: Tuple[float, Dict]

sentiment
sentiment(text: str) -> float

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example:

```python
feedback = Feedback(provider.sentiment).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate sentiment of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "negative sentiment" and 1 being "positive sentiment".

sentiment_with_cot_reasons
sentiment_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.sentiment_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (negative sentiment) and 1.0 (positive sentiment).

TYPE: Tuple[float, Dict]

model_agreement
model_agreement(prompt: str, response: str) -> float

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example:

```python
feedback = Feedback(provider.model_agreement).on_input_output()
```
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not in agreement) and 1.0 (in agreement).

TYPE: float

conciseness
conciseness(text: str) -> float

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate the conciseness of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not concise) and 1.0 (concise).

conciseness_with_cot_reasons
conciseness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```

Args: text: The text to evaluate the conciseness of.

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not concise) and 1.0 (concise) and a string containing the reasons for the evaluation.

correctness
correctness(text: str) -> float

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.correctness).on_output()
```
PARAMETER DESCRIPTION
text

A prompt to an agent.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not correct) and 1.0 (correct).

correctness_with_cot_reasons
correctness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.correctness_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not correct) and 1.0 (correct) and a string containing the reasons for the evaluation.

coherence
coherence(text: str) -> float

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.coherence).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not coherent) and 1.0 (coherent).

TYPE: float

coherence_with_cot_reasons
coherence_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.coherence_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not coherent) and 1.0 (coherent) and a string containing the reasons for the evaluation.

harmfulness
harmfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.harmfulness).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harmful) and 1.0 (harmful)".

TYPE: float

harmfulness_with_cot_reasons
harmfulness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not harmful) and 1.0 (harmful) and a string containing the reasons for the evaluation.

maliciousness
maliciousness(text: str) -> float

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.maliciousness).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not malicious) and 1.0 (malicious).

TYPE: float

maliciousness_with_cot_reasons
maliciousness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not malicious) and 1.0 (malicious) and a string containing the reasons for the evaluation.

helpfulness
helpfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.helpfulness).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not helpful) and 1.0 (helpful).

TYPE: float

helpfulness_with_cot_reasons
helpfulness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not helpful) and 1.0 (helpful) and a string containing the reasons for the evaluation.

controversiality
controversiality(text: str) -> float

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example:

```python
feedback = Feedback(provider.controversiality).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not controversial) and 1.0 (controversial).

TYPE: float

controversiality_with_cot_reasons
controversiality_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.controversiality_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not controversial) and 1.0 (controversial) and a string containing the reasons for the evaluation.

misogyny
misogyny(text: str) -> float

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.misogyny).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not misogynistic) and 1.0 (misogynistic).

TYPE: float

misogyny_with_cot_reasons
misogyny_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.misogyny_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not misogynistic) and 1.0 (misogynistic) and a string containing the reasons for the evaluation.

criminality
criminality(text: str) -> float

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.criminality).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not criminal) and 1.0 (criminal).

TYPE: float

criminality_with_cot_reasons
criminality_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.criminality_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not criminal) and 1.0 (criminal) and a string containing the reasons for the evaluation.

insensitivity
insensitivity(text: str) -> float

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.insensitivity).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not insensitive) and 1.0 (insensitive).

TYPE: float

insensitivity_with_cot_reasons
insensitivity_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not insensitive) and 1.0 (insensitive) and a string containing the reasons for the evaluation.

comprehensiveness_with_cot_reasons
comprehensiveness_with_cot_reasons(
    source: str, summary: str
) -> Tuple[float, Dict]

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example:

```python
feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
```
PARAMETER DESCRIPTION
source

Text corresponding to source material.

TYPE: str

summary

Text corresponding to a summary.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not comprehensive) and 1.0 (comprehensive) and a string containing the reasons for the evaluation.

summarization_with_cot_reasons
summarization_with_cot_reasons(
    source: str, summary: str
) -> Tuple[float, Dict]

Summarization is deprecated in place of comprehensiveness. This function is no longer implemented.

stereotypes
stereotypes(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes).on_input_output()
```
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed).

stereotypes_with_cot_reasons
stereotypes_with_cot_reasons(
    prompt: str, response: str
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes_with_cot_reasons).on_input_output()
```
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed) and a string containing the reasons for the evaluation.

groundedness_measure_with_cot_reasons
groundedness_measure_with_cot_reasons(
    source: str,
    statement: str,
    use_sent_tokenize: bool = True,
) -> Tuple[float, dict]

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not dilute the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

Abstentions will be considered as grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    )
```

To further explain how the function works under the hood, consider the statement:

"Hi. I'm here to help. The university of Washington is a public research university. UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The function will split the statement into its component sentences:

  1. "Hi."
  2. "I'm here to help."
  3. "The university of Washington is a public research university."
  4. "UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

Next, trivial statements are removed, leaving only:

  1. "The university of Washington is a public research university."
  2. "UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The LLM will then process the statement, to assess the groundedness of the statement.

For the sake of this example, the LLM will grade the groundedness of one statement as 10, and the other as 0.

Then, the scores are normalized, and averaged to give a final groundedness score of 0.5.

PARAMETER DESCRIPTION
source

The source that should support the statement.

TYPE: str

statement

The statement to check groundedness.

TYPE: str

use_sent_tokenize

Whether to split the statement into sentences using punkt sentence tokenizer. If False, use LLM to split the statement.

TYPE: bool DEFAULT: True

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

qs_relevance
qs_relevance(*args, **kwargs)

Deprecated. Use relevance instead.

qs_relevance_with_cot_reasons
qs_relevance_with_cot_reasons(*args, **kwargs)

Deprecated. Use relevance_with_cot_reasons instead.

groundedness_measure_with_cot_reasons_consider_answerability
groundedness_measure_with_cot_reasons_consider_answerability(
    source: str,
    statement: str,
    question: str,
    use_sent_tokenize: bool = True,
) -> Tuple[float, dict]

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not delete the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

In the case of abstentions, such as 'I do not know', the LLM will be asked to consider the answerability of the question given the source material.

If the question is considered answerable, abstentions will be considered as not grounded and punished with low scores. Otherwise, unanswerable abstentions will be considered grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    .on_input()
    )
```
PARAMETER DESCRIPTION
source

The source that should support the statement.

TYPE: str

statement

The statement to check groundedness.

TYPE: str

question

The question to check answerability.

TYPE: str

use_sent_tokenize

Whether to split the statement into sentences using punkt sentence tokenizer. If False, use LLM to split the statement.

TYPE: bool DEFAULT: True

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

moderation_hate
moderation_hate(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is hate speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hate, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not hate) and 1.0 (hate).

TYPE: float

moderation_hatethreatening
moderation_hatethreatening(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is threatening speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hatethreatening, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not threatening) and 1.0 (threatening).

TYPE: float

moderation_selfharm
moderation_selfharm(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about self harm.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_selfharm, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not self harm) and 1.0 (self harm).

TYPE: float

moderation_sexual
moderation_sexual(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is sexual speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexual, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not sexual) and 1.0 (sexual).

TYPE: float

moderation_sexualminors
moderation_sexualminors(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about sexual minors.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexualminors, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not sexual minors) and 1.0 (sexual minors).

TYPE: float

moderation_violence
moderation_violence(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violence, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not violence) and 1.0 (violence).

TYPE: float

moderation_violencegraphic
moderation_violencegraphic(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violencegraphic, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not graphic violence) and 1.0 (graphic violence).

TYPE: float

moderation_harassment
moderation_harassment(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harassment) and 1.0 (harassment).

TYPE: float

moderation_harassment_threatening
moderation_harassment_threatening(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment_threatening, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harassment/threatening) and 1.0 (harassment/threatening).

TYPE: float

AzureOpenAI

Bases: OpenAI

Warning

Azure OpenAI does not support the OpenAI moderation endpoint.

Out of the box feedback functions calling AzureOpenAI APIs. Has the same functionality as OpenAI out of the box feedback functions, excluding the moderation endpoint which is not supported by Azure. Please export the following env variables. These can be retrieved from https://oai.azure.com/ .

  • AZURE_OPENAI_ENDPOINT
  • AZURE_OPENAI_API_KEY
  • OPENAI_API_VERSION

Deployment name below is also found on the oai azure page.

Example
from trulens.providers.openai import AzureOpenAI
openai_provider = AzureOpenAI(deployment_name="...")

openai_provider.relevance(
    prompt="Where is Germany?",
    response="Poland is in Europe."
) # low relevance
PARAMETER DESCRIPTION
deployment_name

The name of the deployment.

TYPE: str

Attributes
tru_class_info instance-attribute
tru_class_info: Class

Class information of this pydantic object for use in deserialization.

Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.

Functions
__rich_repr__
__rich_repr__() -> Result

Requirement for pretty printing using the rich package.

load staticmethod
load(obj, *args, **kwargs)

Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.

model_validate classmethod
model_validate(*args, **kwargs) -> Any

Deserialized a jsonized version of the app into the instance of the class it was serialized from.

Note

This process uses extra information stored in the jsonized object and handled by WithClassInfo.

generate_score
generate_score(
    system_prompt: str,
    user_prompt: Optional[str] = None,
    min_score_val: int = 0,
    max_score_val: int = 10,
    temperature: float = 0.0,
) -> float

Base method to generate a score normalized to 0 to 1, used for evaluation.

PARAMETER DESCRIPTION
system_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt.

TYPE: Optional[str] DEFAULT: None

min_score_val

The minimum score value.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value.

TYPE: int DEFAULT: 10

temperature

The temperature for the LLM response.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

The score on a 0-1 scale.

generate_confidence_score
generate_confidence_score(
    verb_confidence_prompt: str,
    user_prompt: Optional[str] = None,
    min_score_val: int = 0,
    max_score_val: int = 10,
    temperature: float = 0.0,
) -> Tuple[float, Dict[str, float]]

Base method to generate a score normalized to 0 to 1, used for evaluation.

PARAMETER DESCRIPTION
verb_confidence_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt.

TYPE: Optional[str] DEFAULT: None

min_score_val

The minimum score value.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value.

TYPE: int DEFAULT: 10

temperature

The temperature for the LLM response.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
Tuple[float, Dict[str, float]]

The feedback score on a 0-1 scale and the confidence score.

generate_score_and_reasons
generate_score_and_reasons(
    system_prompt: str,
    user_prompt: Optional[str] = None,
    min_score_val: int = 0,
    max_score_val: int = 10,
    temperature: float = 0.0,
) -> Tuple[float, Dict]

Base method to generate a score and reason, used for evaluation.

PARAMETER DESCRIPTION
system_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt. Defaults to None.

TYPE: Optional[str] DEFAULT: None

min_score_val

The minimum score value.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value.

TYPE: int DEFAULT: 10

temperature

The temperature for the LLM response.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

The score on a 0-1 scale.

Dict

Reason metadata if returned by the LLM.

context_relevance
context_relevance(
    question: str,
    context: str,
    criteria: str = "",
    min_score_val: int = 0,
    max_score_val: int = 3,
    temperature: float = 0.0,
) -> float

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```
PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

criteria

Overriding evaluation criteria for evaluation .

TYPE: str DEFAULT: ''

min_score_val

The minimum score value. Defaults to 0.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value. Defaults to 3.

TYPE: int DEFAULT: 3

temperature

The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

Returns: float: A value between 0.0 (not relevant) and 1.0 (relevant).

context_relevance_with_cot_reasons
context_relevance_with_cot_reasons(
    question: str,
    context: str,
    criteria: str = "",
    min_score_val: int = 0,
    max_score_val: int = 3,
    temperature: float = 0.0,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```
PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

criteria

Overriding evaluation criteria for evaluation .

TYPE: str DEFAULT: ''

min_score_val

The minimum score value. Defaults to 0.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value. Defaults to 3.

TYPE: int DEFAULT: 3

temperature

The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: Tuple[float, Dict]

context_relevance_verb_confidence
context_relevance_verb_confidence(
    question: str,
    context: str,
    criteria: str = "",
    min_score_val: int = 0,
    max_score_val: int = 3,
    temperature: float = 0.0,
) -> Tuple[float, Dict[str, float]]

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example:

```python
from trulens.apps.llamaindex import TruLlama
context = TruLlama.select_context(llamaindex_rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
```
PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

criteria

Overriding evaluation criteria for evaluation .

TYPE: str DEFAULT: ''

min_score_val

The minimum score value. Defaults to 0.

TYPE: int DEFAULT: 0

max_score_val

The maximum score value. Defaults to 3.

TYPE: int DEFAULT: 3

temperature

The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

Returns: float: A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". Dict[str, float]: A dictionary containing the confidence score.

relevance
relevance(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example:

```python
feedback = Feedback(provider.relevance).on_input_output()
```
Usage on RAG Contexts
feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: float

relevance_with_cot_reasons
relevance_with_cot_reasons(
    prompt: str, response: str
) -> Tuple[float, Dict]

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = (
    Feedback(provider.relevance_with_cot_reasons)
    .on_input()
    .on_output()
```
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: Tuple[float, Dict]

sentiment
sentiment(text: str) -> float

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example:

```python
feedback = Feedback(provider.sentiment).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate sentiment of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "negative sentiment" and 1 being "positive sentiment".

sentiment_with_cot_reasons
sentiment_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.sentiment_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (negative sentiment) and 1.0 (positive sentiment).

TYPE: Tuple[float, Dict]

model_agreement
model_agreement(prompt: str, response: str) -> float

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example:

```python
feedback = Feedback(provider.model_agreement).on_input_output()
```
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not in agreement) and 1.0 (in agreement).

TYPE: float

conciseness
conciseness(text: str) -> float

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate the conciseness of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not concise) and 1.0 (concise).

conciseness_with_cot_reasons
conciseness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.conciseness).on_output()
```

Args: text: The text to evaluate the conciseness of.

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not concise) and 1.0 (concise) and a string containing the reasons for the evaluation.

correctness
correctness(text: str) -> float

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.correctness).on_output()
```
PARAMETER DESCRIPTION
text

A prompt to an agent.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not correct) and 1.0 (correct).

correctness_with_cot_reasons
correctness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.correctness_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not correct) and 1.0 (correct) and a string containing the reasons for the evaluation.

coherence
coherence(text: str) -> float

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.coherence).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not coherent) and 1.0 (coherent).

TYPE: float

coherence_with_cot_reasons
coherence_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.coherence_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not coherent) and 1.0 (coherent) and a string containing the reasons for the evaluation.

harmfulness
harmfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.harmfulness).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harmful) and 1.0 (harmful)".

TYPE: float

harmfulness_with_cot_reasons
harmfulness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not harmful) and 1.0 (harmful) and a string containing the reasons for the evaluation.

maliciousness
maliciousness(text: str) -> float

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.maliciousness).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not malicious) and 1.0 (malicious).

TYPE: float

maliciousness_with_cot_reasons
maliciousness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not malicious) and 1.0 (malicious) and a string containing the reasons for the evaluation.

helpfulness
helpfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.helpfulness).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not helpful) and 1.0 (helpful).

TYPE: float

helpfulness_with_cot_reasons
helpfulness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not helpful) and 1.0 (helpful) and a string containing the reasons for the evaluation.

controversiality
controversiality(text: str) -> float

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example:

```python
feedback = Feedback(provider.controversiality).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not controversial) and 1.0 (controversial).

TYPE: float

controversiality_with_cot_reasons
controversiality_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.controversiality_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not controversial) and 1.0 (controversial) and a string containing the reasons for the evaluation.

misogyny
misogyny(text: str) -> float

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.misogyny).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not misogynistic) and 1.0 (misogynistic).

TYPE: float

misogyny_with_cot_reasons
misogyny_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.misogyny_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not misogynistic) and 1.0 (misogynistic) and a string containing the reasons for the evaluation.

criminality
criminality(text: str) -> float

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.criminality).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not criminal) and 1.0 (criminal).

TYPE: float

criminality_with_cot_reasons
criminality_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.criminality_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not criminal) and 1.0 (criminal) and a string containing the reasons for the evaluation.

insensitivity
insensitivity(text: str) -> float

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example:

```python
feedback = Feedback(provider.insensitivity).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not insensitive) and 1.0 (insensitive).

TYPE: float

insensitivity_with_cot_reasons
insensitivity_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example:

```python
feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()
```
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not insensitive) and 1.0 (insensitive) and a string containing the reasons for the evaluation.

comprehensiveness_with_cot_reasons
comprehensiveness_with_cot_reasons(
    source: str, summary: str
) -> Tuple[float, Dict]

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example:

```python
feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
```
PARAMETER DESCRIPTION
source

Text corresponding to source material.

TYPE: str

summary

Text corresponding to a summary.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not comprehensive) and 1.0 (comprehensive) and a string containing the reasons for the evaluation.

summarization_with_cot_reasons
summarization_with_cot_reasons(
    source: str, summary: str
) -> Tuple[float, Dict]

Summarization is deprecated in place of comprehensiveness. This function is no longer implemented.

stereotypes
stereotypes(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes).on_input_output()
```
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed).

stereotypes_with_cot_reasons
stereotypes_with_cot_reasons(
    prompt: str, response: str
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example:

```python
feedback = Feedback(provider.stereotypes_with_cot_reasons).on_input_output()
```
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed) and a string containing the reasons for the evaluation.

groundedness_measure_with_cot_reasons
groundedness_measure_with_cot_reasons(
    source: str,
    statement: str,
    use_sent_tokenize: bool = True,
) -> Tuple[float, dict]

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not dilute the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

Abstentions will be considered as grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    )
```

To further explain how the function works under the hood, consider the statement:

"Hi. I'm here to help. The university of Washington is a public research university. UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The function will split the statement into its component sentences:

  1. "Hi."
  2. "I'm here to help."
  3. "The university of Washington is a public research university."
  4. "UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

Next, trivial statements are removed, leaving only:

  1. "The university of Washington is a public research university."
  2. "UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"

The LLM will then process the statement, to assess the groundedness of the statement.

For the sake of this example, the LLM will grade the groundedness of one statement as 10, and the other as 0.

Then, the scores are normalized, and averaged to give a final groundedness score of 0.5.

PARAMETER DESCRIPTION
source

The source that should support the statement.

TYPE: str

statement

The statement to check groundedness.

TYPE: str

use_sent_tokenize

Whether to split the statement into sentences using punkt sentence tokenizer. If False, use LLM to split the statement.

TYPE: bool DEFAULT: True

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

qs_relevance
qs_relevance(*args, **kwargs)

Deprecated. Use relevance instead.

qs_relevance_with_cot_reasons
qs_relevance_with_cot_reasons(*args, **kwargs)

Deprecated. Use relevance_with_cot_reasons instead.

groundedness_measure_with_cot_reasons_consider_answerability
groundedness_measure_with_cot_reasons_consider_answerability(
    source: str,
    statement: str,
    question: str,
    use_sent_tokenize: bool = True,
) -> Tuple[float, dict]

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The statement will first be split by a tokenizer into its component sentences.

Then, trivial statements are eliminated so as to not delete the evaluation.

The LLM will process each statement, using chain of thought methodology to emit the reasons.

In the case of abstentions, such as 'I do not know', the LLM will be asked to consider the answerability of the question given the source material.

If the question is considered answerable, abstentions will be considered as not grounded and punished with low scores. Otherwise, unanswerable abstentions will be considered grounded.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    .on_input()
    )
```
PARAMETER DESCRIPTION
source

The source that should support the statement.

TYPE: str

statement

The statement to check groundedness.

TYPE: str

question

The question to check answerability.

TYPE: str

use_sent_tokenize

Whether to split the statement into sentences using punkt sentence tokenizer. If False, use LLM to split the statement.

TYPE: bool DEFAULT: True

Returns: Tuple[float, dict]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation.

moderation_hate
moderation_hate(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is hate speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hate, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not hate) and 1.0 (hate).

TYPE: float

moderation_hatethreatening
moderation_hatethreatening(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is threatening speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hatethreatening, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not threatening) and 1.0 (threatening).

TYPE: float

moderation_selfharm
moderation_selfharm(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about self harm.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_selfharm, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not self harm) and 1.0 (self harm).

TYPE: float

moderation_sexual
moderation_sexual(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is sexual speech.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexual, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not sexual) and 1.0 (sexual).

TYPE: float

moderation_sexualminors
moderation_sexualminors(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about sexual minors.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexualminors, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not sexual minors) and 1.0 (sexual minors).

TYPE: float

moderation_violence
moderation_violence(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violence, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not violence) and 1.0 (violence).

TYPE: float

moderation_violencegraphic
moderation_violencegraphic(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violencegraphic, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not graphic violence) and 1.0 (graphic violence).

TYPE: float

moderation_harassment
moderation_harassment(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harassment) and 1.0 (harassment).

TYPE: float

moderation_harassment_threatening
moderation_harassment_threatening(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example:

```python
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment_threatening, higher_is_better=False
).on_output()
```
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harassment/threatening) and 1.0 (harassment/threatening).

TYPE: float