trulens.providers.bedrockΒΆ
trulens.providers.bedrock
ΒΆ
Additional Dependency Required
To use this module, you must have the trulens-providers-bedrock
package installed.
pip install trulens-providers-bedrock
Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case
All feedback functions listed in the base LLMProvider class can be run with AWS Bedrock.
ClassesΒΆ
Bedrock
ΒΆ
Bases: LLMProvider
A set of AWS Feedback Functions.
PARAMETER | DESCRIPTION |
---|---|
model_id
|
The specific model id. Defaults to "amazon.titan-text-express-v1". |
*args
|
args passed to BedrockEndpoint and subsequently to boto3 client constructor.
DEFAULT:
|
**kwargs
|
kwargs passed to BedrockEndpoint and subsequently to boto3 client constructor.
DEFAULT:
|
AttributesΒΆ
tru_class_info
instance-attribute
ΒΆ
tru_class_info: Class
Class information of this pydantic object for use in deserialization.
Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.
FunctionsΒΆ
load
staticmethod
ΒΆ
load(obj, *args, **kwargs)
Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.
model_validate
classmethod
ΒΆ
model_validate(*args, **kwargs) -> Any
Deserialized a jsonized version of the app into the instance of the class it was serialized from.
Note
This process uses extra information stored in the jsonized object and handled by WithClassInfo.
_determine_output_space
ΒΆ
context_relevance
ΒΆ
context_relevance(
question: str,
context: str,
criteria: Optional[str] = None,
examples: Optional[List[str]] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the relevance of the context to the question.
Example
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
Feedback(provider.context_relevance)
.on_input()
.on(context)
.aggregate(np.mean)
)
PARAMETER | DESCRIPTION |
---|---|
question
|
A question being asked.
TYPE:
|
context
|
Context related to the question.
TYPE:
|
criteria
|
If provided, overrides the evaluation criteria for evaluation. Defaults to None. |
min_score_val
|
The minimum score value. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value. Defaults to 3.
TYPE:
|
temperature
|
The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.
TYPE:
|
Returns: float: A value between 0.0 (not relevant) and 1.0 (relevant).
context_relevance_with_cot_reasons
ΒΆ
context_relevance_with_cot_reasons(
question: str,
context: str,
criteria: Optional[str] = None,
examples: Optional[List[str]] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.
Example
from trulens.apps.langchain import TruChain
context = TruChain.select_context(rag_app)
feedback = (
Feedback(provider.context_relevance_with_cot_reasons)
.on_input()
.on(context)
.aggregate(np.mean)
)
PARAMETER | DESCRIPTION |
---|---|
question
|
A question being asked.
TYPE:
|
context
|
Context related to the question.
TYPE:
|
criteria
|
If provided, overrides the evaluation criteria for evaluation. Defaults to None. |
min_score_val
|
The minimum score value. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value. Defaults to 3.
TYPE:
|
temperature
|
The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.
TYPE:
|
relevance
ΒΆ
relevance(
prompt: str,
response: str,
criteria: Optional[str] = None,
examples: Optional[List[str]] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.
Example
feedback = Feedback(provider.relevance).on_input_output()
Usage on RAG Contexts
feedback = Feedback(provider.relevance).on_input().on(
TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)
PARAMETER | DESCRIPTION |
---|---|
prompt
|
A text prompt to an agent.
TYPE:
|
response
|
The agent's response to the prompt.
TYPE:
|
criteria
|
If provided, overrides the evaluation criteria for evaluation. Defaults to None. |
min_score_val
|
The minimum score value used by the LLM before normalization. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value used by the LLM before normalization. Defaults to 3.
TYPE:
|
temperature
|
The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".
TYPE:
|
relevance_with_cot_reasons
ΒΆ
relevance_with_cot_reasons(
prompt: str,
response: str,
criteria: Optional[str] = None,
examples: Optional[List[str]] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> Tuple[float, Dict]
Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.
Example
feedback = (
Feedback(provider.relevance_with_cot_reasons)
.on_input()
.on_output()
PARAMETER | DESCRIPTION |
---|---|
prompt
|
A text prompt to an agent.
TYPE:
|
response
|
The agent's response to the prompt.
TYPE:
|
criteria
|
If provided, overrides the evaluation criteria for evaluation. Defaults to None. |
min_score_val
|
The minimum score value used by the LLM before normalization. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value used by the LLM before normalization. Defaults to 3.
TYPE:
|
temperature
|
The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.
TYPE:
|
sentiment
ΒΆ
sentiment(
text: str,
criteria: str = None,
examples: Optional[List[str]] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the sentiment of some text.
Example
feedback = Feedback(provider.sentiment).on_output()
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0 and 1. 0 being "negative sentiment" and 1 being "positive sentiment". |
sentiment_with_cot_reasons
ΒΆ
sentiment_with_cot_reasons(
text: str,
criteria: str = None,
examples: Optional[List[str]] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.sentiment_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to evaluate.
TYPE:
|
min_score_val
|
The minimum score value used by the LLM before normalization. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value used by the LLM before normalization. Defaults to 3.
TYPE:
|
temperature
|
The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.
TYPE:
|
model_agreement
ΒΆ
Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.
Example
feedback = Feedback(provider.model_agreement).on_input_output()
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not in agreement) and 1.0 (in agreement).
TYPE:
|
_langchain_evaluate
ΒΆ
_langchain_evaluate(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A general function that completes a template to evaluate different aspects of some text. Prompt credit to Langchain.
PARAMETER | DESCRIPTION |
---|---|
text
|
A prompt to an agent.
TYPE:
|
criteria
|
The specific criteria for evaluation.
TYPE:
|
min_score_val
|
The minimum score value used by the LLM before normalization. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value used by the LLM before normalization. Defaults to 3.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 and 1.0, representing the specified evaluation.
TYPE:
|
_langchain_evaluate_with_cot_reasons
ΒΆ
_langchain_evaluate_with_cot_reasons(
text: str,
criteria: str,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A general function that completes a template to evaluate different aspects of some text. Prompt credit to Langchain.
PARAMETER | DESCRIPTION |
---|---|
text
|
A prompt to an agent.
TYPE:
|
criteria
|
The specific criteria for evaluation.
TYPE:
|
min_score_val
|
The minimum score value used by the LLM before normalization. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value used by the LLM before normalization. Defaults to 3.
TYPE:
|
temperature
|
The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.
TYPE:
|
conciseness
ΒΆ
conciseness(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.conciseness).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate the conciseness of.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not concise) and 1.0 (concise). |
conciseness_with_cot_reasons
ΒΆ
conciseness_with_cot_reasons(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.conciseness).on_output()
Args: text: The text to evaluate the conciseness of.
correctness
ΒΆ
correctness(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.correctness).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
A prompt to an agent.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not correct) and 1.0 (correct). |
correctness_with_cot_reasons
ΒΆ
correctness_with_cot_reasons(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.correctness_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to evaluate.
TYPE:
|
coherence
ΒΆ
coherence(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.coherence).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not coherent) and 1.0 (coherent).
TYPE:
|
coherence_with_cot_reasons
ΒΆ
coherence_with_cot_reasons(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.coherence_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
harmfulness
ΒΆ
harmfulness(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.harmfulness).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not harmful) and 1.0 (harmful)".
TYPE:
|
harmfulness_with_cot_reasons
ΒΆ
harmfulness_with_cot_reasons(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
maliciousness
ΒΆ
maliciousness(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.maliciousness).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not malicious) and 1.0 (malicious).
TYPE:
|
maliciousness_with_cot_reasons
ΒΆ
maliciousness_with_cot_reasons(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
helpfulness
ΒΆ
helpfulness(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.helpfulness).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not helpful) and 1.0 (helpful).
TYPE:
|
helpfulness_with_cot_reasons
ΒΆ
helpfulness_with_cot_reasons(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
controversiality
ΒΆ
controversiality(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.
Example
feedback = Feedback(provider.controversiality).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not controversial) and 1.0 (controversial).
TYPE:
|
controversiality_with_cot_reasons
ΒΆ
controversiality_with_cot_reasons(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.controversiality_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
misogyny
ΒΆ
misogyny(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.misogyny).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not misogynistic) and 1.0 (misogynistic).
TYPE:
|
misogyny_with_cot_reasons
ΒΆ
misogyny_with_cot_reasons(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.misogyny_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
criminality
ΒΆ
criminality(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.criminality).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not criminal) and 1.0 (criminal).
TYPE:
|
criminality_with_cot_reasons
ΒΆ
criminality_with_cot_reasons(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.criminality_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
insensitivity
ΒΆ
insensitivity(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.insensitivity).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not insensitive) and 1.0 (insensitive).
TYPE:
|
insensitivity_with_cot_reasons
ΒΆ
insensitivity_with_cot_reasons(
text: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to evaluate.
TYPE:
|
_get_answer_agreement
ΒΆ
_generate_key_points
ΒΆ
Uses chat completion model. A function that tries to distill main points to be used by the comprehensiveness feedback function.
Args: source (str): Text corresponding to source material.
RETURNS | DESCRIPTION |
---|---|
str
|
(str) key points of the source text. |
_assess_key_point_inclusion
ΒΆ
_assess_key_point_inclusion(
key_points: str,
summary: str,
min_score_val: int = 0,
max_score_val: int = 3,
criteria: Optional[str] = None,
temperature: float = 0.0,
) -> List
Splits key points by newlines and assesses if each one is included in the summary.
RETURNS | DESCRIPTION |
---|---|
List
|
List[str]: A list of strings indicating whether each key point is included in the summary. |
comprehensiveness_with_cot_reasons
ΒΆ
comprehensiveness_with_cot_reasons(
source: str,
summary: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.
Example
feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
summarization_with_cot_reasons
ΒΆ
Summarization is deprecated in place of comprehensiveness. This function is no longer implemented.
stereotypes
ΒΆ
stereotypes(
prompt: str,
response: str,
criteria: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
) -> float
Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.
Example
feedback = Feedback(provider.stereotypes).on_input_output()
PARAMETER | DESCRIPTION |
---|---|
prompt
|
A text prompt to an agent.
TYPE:
|
response
|
The agent's response to the prompt.
TYPE:
|
min_score_val
|
The minimum score value used by the LLM before normalization. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value used by the LLM before normalization. Defaults to 3.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed). |
stereotypes_with_cot_reasons
ΒΆ
stereotypes_with_cot_reasons(
prompt: str,
response: str,
criteria: Optional[str] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.
Example
feedback = Feedback(provider.stereotypes_with_cot_reasons).on_input_output()
PARAMETER | DESCRIPTION |
---|---|
prompt
|
A text prompt to an agent.
TYPE:
|
response
|
The agent's response to the prompt.
TYPE:
|
min_score_val
|
The minimum score value used by the LLM before normalization. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value used by the LLM before normalization. Defaults to 3.
TYPE:
|
temperature
|
The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.
TYPE:
|
_remove_trivial_statements
ΒΆ
groundedness_measure_with_cot_reasons
ΒΆ
groundedness_measure_with_cot_reasons(
source: str,
statement: str,
criteria: Optional[str] = None,
examples: Optional[str] = None,
groundedness_configs: Optional[
GroundednessConfigs
] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> Tuple[float, dict]
A measure to track if the source material supports each sentence in the statement using an LLM provider.
The statement will first be split by a tokenizer into its component sentences.
Then, trivial statements are eliminated so as to not dilute the evaluation.
The LLM will process each statement, using chain of thought methodology to emit the reasons.
Abstentions will be considered as grounded.
Example
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
provider = OpenAI()
f_groundedness = (
Feedback(provider.groundedness_measure_with_cot_reasons)
.on(context.collect()
.on_output()
)
To further explain how the function works under the hood, consider the statement:
"Hi. I'm here to help. The university of Washington is a public research university. UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"
The function will split the statement into its component sentences:
- "Hi."
- "I'm here to help."
- "The university of Washington is a public research university."
- "UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"
Next, trivial statements are removed, leaving only:
- "The university of Washington is a public research university."
- "UW's connections to major corporations in Seattle contribute to its reputation as a hub for innovation and technology"
The LLM will then process the statement, to assess the groundedness of the statement.
For the sake of this example, the LLM will grade the groundedness of one statement as 10, and the other as 0.
Then, the scores are normalized, and averaged to give a final groundedness score of 0.5.
PARAMETER | DESCRIPTION |
---|---|
source
|
The source that should support the statement.
TYPE:
|
statement
|
The statement to check groundedness.
TYPE:
|
criteria
|
The specific criteria for evaluation. Defaults to None.
TYPE:
|
use_sent_tokenize
|
Whether to split the statement into sentences using punkt sentence tokenizer. If
TYPE:
|
min_score_val
|
The minimum score value used by the LLM before normalization. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value used by the LLM before normalization. Defaults to 3.
TYPE:
|
temperature
|
The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.
TYPE:
|
qs_relevance_with_cot_reasons
ΒΆ
qs_relevance_with_cot_reasons(*args, **kwargs)
Deprecated. Use relevance_with_cot_reasons
instead.
groundedness_measure_with_cot_reasons_consider_answerability
ΒΆ
groundedness_measure_with_cot_reasons_consider_answerability(
source: str,
statement: str,
question: str,
criteria: Optional[str] = None,
examples: Optional[List[str]] = None,
groundedness_configs: Optional[
GroundednessConfigs
] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> Tuple[float, dict]
A measure to track if the source material supports each sentence in the statement using an LLM provider.
The statement will first be split by a tokenizer into its component sentences.
Then, trivial statements are eliminated so as to not delete the evaluation.
The LLM will process each statement, using chain of thought methodology to emit the reasons.
In the case of abstentions, such as 'I do not know', the LLM will be asked to consider the answerability of the question given the source material.
If the question is considered answerable, abstentions will be considered as not grounded and punished with low scores. Otherwise, unanswerable abstentions will be considered grounded.
Example
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
provider = OpenAI()
f_groundedness = (
Feedback(provider.groundedness_measure_with_cot_reasons)
.on(context.collect()
.on_output()
.on_input()
)
PARAMETER | DESCRIPTION |
---|---|
source
|
The source that should support the statement.
TYPE:
|
statement
|
The statement to check groundedness.
TYPE:
|
question
|
The question to check answerability.
TYPE:
|
criteria
|
The specific criteria for evaluation. Defaults to None.
TYPE:
|
use_sent_tokenize
|
Whether to split the statement into sentences using punkt sentence tokenizer. If
TYPE:
|
min_score_val
|
The minimum score value used by the LLM before normalization. Defaults to 0.
TYPE:
|
max_score_val
|
The maximum score value used by the LLM before normalization. Defaults to 3.
TYPE:
|
temperature
|
The temperature for the LLM response, which might have impact on the confidence level of the evaluation. Defaults to 0.0.
TYPE:
|
generate_score
ΒΆ
generate_score(
system_prompt: str,
user_prompt: Optional[str] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> float
Base method to generate a score only, used for evaluation.
PARAMETER | DESCRIPTION |
---|---|
system_prompt
|
A pre-formatted system prompt.
TYPE:
|
user_prompt
|
An optional user prompt. |
min_score_val
|
The minimum score value. Default is 0.
TYPE:
|
max_score_val
|
The maximum score value. Default is 3.
TYPE:
|
temperature
|
The temperature value for LLM score generation. Default is 0.0.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The score on a 0-1 scale. |
generate_score_and_reasons
ΒΆ
generate_score_and_reasons(
system_prompt: str,
user_prompt: Optional[str] = None,
min_score_val: int = 0,
max_score_val: int = 3,
temperature: float = 0.0,
) -> Union[float, Tuple[float, Dict]]
Base method to generate a score and reason, used for evaluation.
PARAMETER | DESCRIPTION |
---|---|
system_prompt
|
A pre-formatted system prompt.
TYPE:
|
user_prompt
|
An optional user prompt. |
min_score_val
|
The minimum score value. Default is 0.
TYPE:
|
max_score_val
|
The maximum score value. Default is 3.
TYPE:
|
temperature
|
The temperature value for LLM score generation. Default is 0.0.
TYPE:
|