Evaluation using Feedback Functions¶
Why do you need feedback functions?¶
Measuring the performance of LLM apps is a critical step in the path from development to production. You would not move a traditional ML system to production without first gaining confidence by measuring its accuracy on a representative test set.
However unlike in traditional machine learning, ground truth is sparse and often entirely unavailable.
Without ground truth on which to compute metrics on our LLM apps, feedback functions can be used to compute metrics for LLM applications.
What is a feedback function?¶
Feedback functions, analogous to labeling functions, provide a programmatic method for generating evaluations on an application run. In our view, this method of evaluations is far more useful than general benchmarks because they measure the performance of your app, on your data, for your users.
Important Concept
TruLens constructs feedback functions by combining more general models, known as the feedback provider, and feedback implementation made up of carefully constructed prompts and custom logic tailored to perform a particular evaluation task.
This construction is composable and extensible.
Composable meaning that the user can choose to combine any feedback provider with any feedback implementation.
Extensible meaning that the user can extend a feedback provider with custom feedback implementations of the user's choosing.
Example
In a high stakes domain requiring evaluating long chunks of context, the user may choose to use a more expensive SOTA model.
In lower stakes, higher volume scenarios, the user may choose to use a smaller, cheaper model as the provider.
In either case, any feedback provider can be combined with a TruLens feedback implementation to ultimately compose the feedback function.