Monitoring and Evaluating NeMo Guardrails apps¶
This notebook demonstrates how to instrument NeMo Guardrails apps to monitor their invocations and run feedback functions on their final or intermediate results. The reverse integration, of using trulens within rails apps, is shown in the other notebook in this folder.
# Install NeMo Guardrails if not already installed.
# !pip install trulens trulens-apps-nemo trulens-providers-openai trulens-providers-huggingface nemoguardrails
Setup keys and trulens¶
# This notebook uses openai and huggingface providers which need some keys set.
# You can set them here:
from trulens.core import TruSession
from trulens.core.utils.keys import check_or_set_keys
check_or_set_keys(OPENAI_API_KEY="to fill in", HUGGINGFACE_API_KEY="to fill in")
# Load trulens, reset the database:
session = TruSession()
session.reset_database()
Rails app setup¶
The files created below define a configuration of a rails app adapted from various examples in the NeMo-Guardrails repository. There is nothing unusual about the app beyond the knowledge base here being the trulens documentation. This means you should be able to ask the resulting bot questions regarding trulens instead of the fictional company handbook as was the case in the originating example.
%%writefile config.yaml
# Adapted from NeMo-Guardrails/nemoguardrails/examples/bots/abc/config.yml
instructions:
- type: general
content: |
Below is a conversation between a user and a bot called the trulens Bot.
The bot is designed to answer questions about the trulens python library.
The bot is knowledgeable about python.
If the bot does not know the answer to a question, it truthfully says it does not know.
sample_conversation: |
user "Hi there. Can you help me with some questions I have about trulens?"
express greeting and ask for assistance
bot express greeting and confirm and offer assistance
"Hi there! I'm here to help answer any questions you may have about the trulens. What would you like to know?"
models:
- type: main
engine: openai
model: gpt-3.5-turbo-instruct
%%writefile config.co
# Adapted from NeMo-Guardrails/tests/test_configs/with_kb_openai_embeddings/config.co
define user ask capabilities
"What can you do?"
"What can you help me with?"
"tell me what you can do"
"tell me about you"
define bot inform capabilities
"I am an AI bot that helps answer questions about trulens."
define flow
user ask capabilities
bot inform capabilities
Rails app instantiation¶
The instantiation of the app does not differ from the steps presented in NeMo.
from nemoguardrails import LLMRails
from nemoguardrails import RailsConfig
config = RailsConfig.from_path(".")
rails = LLMRails(config)
assert (
rails.kb is not None
), "Knowledge base not loaded. You might be using the wrong nemo release or branch."
Feedback functions setup¶
Lets consider some feedback functions. We will define two types: a simple
language match that checks whether output of the app is in the same language as
the input. The second is a set of three for evaluating context retrieval. The
setup for these is similar to that for other app types such as langchain except
we provide a utility RAG_triad
to create the three context retrieval functions
for you instead of having to create them separately.
from pprint import pprint
from trulens.core import Feedback
from trulens.core import Select
from trulens.feedback.feedback import rag_triad
from trulens.apps.nemo import TruRails
from trulens.providers.huggingface import Huggingface
from trulens.providers.openai import OpenAI
# Initialize provider classes
openai = OpenAI()
hugs = Huggingface()
# select context to be used in feedback. the location of context is app specific.
context = TruRails.select_context(rails)
question = Select.RecordInput
answer = Select.RecordOutput
f_language_match = (
Feedback(hugs.language_match, if_exists=answer).on(question).on(answer)
)
fs_triad = rag_triad(
provider=openai, question=question, answer=answer, context=context
)
# Overview of the 4 feedback functions defined.
pprint(f_language_match)
pprint(fs_triad)
TruRails
recorder instantiation¶
Tru recorder construction is identical to other app types.
tru_rails = TruRails(
rails,
app_name="my first trurails app", # optional
feedbacks=[f_language_match, *fs_triad.values()], # optional
)
Logged app invocation¶
Using tru_rails
as a context manager means the invocations of the rail app
will be logged and feedback will be evaluated on the results.
with tru_rails as recorder:
res = rails.generate(
messages=[
{
"role": "user",
"content": "Can I use AzureOpenAI to define a provider?",
}
]
)
print(res["content"])
Dashboard¶
You should be able to view the above invocation in the dashboard. It can be started with the following code.
from trulens.dashboard import run_dashboard
run_dashboard(session)
Feedback retrieval¶
While feedback can be inspected on the dashboard, you can also retrieve its results in the notebook.
# Get the record from the above context manager.
record = recorder.get()
# Wait for the result futures to be completed and print them.
for feedback, result in record.wait_for_feedback_results().items():
print(feedback.name, result.result)
App testing with Feedback¶
Try out various other interactions to show off the capabilities of the feedback functions. For example, we can try to make the model answer in a different language than our prompt.
# Intended to produce low score on language match but seems random:
with tru_rails as recorder:
res = rails.generate(
messages=[
{
"role": "user",
"content": "Please answer in Spanish: can I use AzureOpenAI to define a provider?",
}
]
)
print(res["content"])
for feedback, result in recorder.get().wait_for_feedback_results().items():
print(feedback.name, result.result)