trulens.benchmark.generate.generate_test_set¶

trulens.benchmark.generate.generate_test_set ¶

Classes¶

GenerateTestSet ¶

This class is responsible for generating a test set using the provided application callable.

Functions¶

init ¶

__init__(app_callable: Callable)

Initialize the GenerateTestSet class.

PARAMETER	DESCRIPTION
`app_callable`	The application callable to be used for generating the test set. TYPE: `Callable`

_generate_themes ¶

_generate_themes(test_breadth: int) -> str

Generates themes of the context available using a RAG application. These themes, which comprise the test breadth, will be used as categories for test set generation.

PARAMETER	DESCRIPTION
`test_breadth`	The breadth of the test. TYPE: `int`

RETURNS	DESCRIPTION
`list`	A list of test categories. TYPE: `str`

_format_themes ¶

_format_themes(themes: str, test_breadth: int) -> list

Formats the themes into a python list using an LLM.

PARAMETER	DESCRIPTION
`themes`	The themes to be formatted. TYPE: `str`

RETURNS	DESCRIPTION
`list`	A list of formatted themes. TYPE: `list`

_generate_test_prompts ¶

_generate_test_prompts(
    test_category: str,
    test_depth: int,
    examples: Optional[list] = None,
) -> str

Generate raw test prompts for a given category, optionally using few shot examples.

PARAMETER	DESCRIPTION
`test_category`	The category for which to generate test prompts. TYPE: `str`
`test_depth`	The depth of the test prompts. TYPE: `int`
`examples`	An optional list of examples to guide the style of the questions. TYPE: `Optional[list]` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	A string containing test prompts. TYPE: `str`

_format_test_prompts ¶

_format_test_prompts(raw_test_prompts: str) -> list

Format the raw test prompts into a python list using an LLM.

PARAMETER	DESCRIPTION
`raw_test_prompts`	The raw test prompts to be formatted. TYPE: `str`

RETURNS	DESCRIPTION
`list`	A list of formatted test prompts. TYPE: `list`

_generate_and_format_test_prompts ¶

_generate_and_format_test_prompts(
    test_category: str,
    test_depth: int,
    examples: Optional[list] = None,
) -> list

Generate test prompts for a given category, optionally using few shot examples.

PARAMETER	DESCRIPTION
`test_category`	The category for which to generate test prompts. TYPE: `str`
`test_depth`	The depth of the test prompts. TYPE: `int`
`examples`	An optional list of examples to guide the style of the questions. TYPE: `Optional[list]` DEFAULT: `None`

RETURNS	DESCRIPTION
`list`	A list of test prompts. TYPE: `list`

generate_test_set ¶

generate_test_set(
    test_breadth: int,
    test_depth: int,
    examples: Optional[list] = None,
) -> dict

Generate a test set, optionally using few shot examples provided.

PARAMETER	DESCRIPTION
`test_breadth`	The breadth of the test set. TYPE: `int`
`test_depth`	The depth of the test set. TYPE: `int`
`examples`	An optional list of examples to guide the style of the questions. TYPE: `Optional[list]` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict`	A dictionary containing the test set. TYPE: `dict`

Example

# Instantiate GenerateTestSet with your app callable, in this case: rag_chain.invoke
test = GenerateTestSet(app_callable = rag_chain.invoke)

# Generate the test set of a specified breadth and depth without examples
test_set = test.generate_test_set(test_breadth = 3, test_depth = 2)

# Generate the test set of a specified breadth and depth with examples
examples = ["Why is it hard for AI to plan very far into the future?", "How could letting AI reflect on what went wrong help it improve in the future?"]
test_set_with_examples = test.generate_test_set(test_breadth = 3, test_depth = 2, examples = examples)

trulens.benchmark.generate.generate_test_set¶