Monitoring and Evaluating NeMo Guardrails apps¶

This notebook demonstrates how to instrument NeMo Guardrails apps to monitor their invocations and run feedback functions on their final or intermediate results. The reverse integration, of using trulens within rails apps, is shown in the other notebook in this folder.

In [ ]:

Copied!

# Install NeMo Guardrails if not already installed.
# !pip install trulens trulens-apps-nemo trulens-providers-openai trulens-providers-huggingface nemoguardrails
# Install NeMo Guardrails if not already installed.
# !pip install trulens trulens-apps-nemo trulens-providers-openai trulens-providers-huggingface nemoguardrails

Setup keys and trulens¶

In [ ]:

Copied!

# This notebook uses openai and huggingface providers which need some keys set.
# You can set them here:

from trulens.core import TruSession
from trulens.core.utils.keys import check_or_set_keys

check_or_set_keys(OPENAI_API_KEY="to fill in", HUGGINGFACE_API_KEY="to fill in")

# Load trulens, reset the database:

session = TruSession()
session.reset_database()
# This notebook uses openai and huggingface providers which need some keys set.
# You can set them here:

from trulens.core import TruSession
from trulens.core.utils.keys import check_or_set_keys

check_or_set_keys(OPENAI_API_KEY="to fill in", HUGGINGFACE_API_KEY="to fill in")

# Load trulens, reset the database:

session = TruSession()
session.reset_database()

Rails app setup¶

The files created below define a configuration of a rails app adapted from various examples in the NeMo-Guardrails repository. There is nothing unusual about the app beyond the knowledge base here being the trulens documentation. This means you should be able to ask the resulting bot questions regarding trulens instead of the fictional company handbook as was the case in the originating example.

In [ ]:

Copied!





%%writefile config.yaml
# Adapted from NeMo-Guardrails/nemoguardrails/examples/bots/abc/config.yml
instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot called the trulens Bot.
      The bot is designed to answer questions about the trulens python library.
      The bot is knowledgeable about python.
      If the bot does not know the answer to a question, it truthfully says it does not know.

sample_conversation: |
  user "Hi there. Can you help me with some questions I have about trulens?"
    express greeting and ask for assistance
  bot express greeting and confirm and offer assistance
    "Hi there! I'm here to help answer any questions you may have about the trulens. What would you like to know?"

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct
%%writefile config.yaml
# Adapted from NeMo-Guardrails/nemoguardrails/examples/bots/abc/config.yml
instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot called the trulens Bot.
      The bot is designed to answer questions about the trulens python library.
      The bot is knowledgeable about python.
      If the bot does not know the answer to a question, it truthfully says it does not know.

sample_conversation: |
  user "Hi there. Can you help me with some questions I have about trulens?"
    express greeting and ask for assistance
  bot express greeting and confirm and offer assistance
    "Hi there! I'm here to help answer any questions you may have about the trulens. What would you like to know?"

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct

In [ ]:

Copied!





%%writefile config.co
# Adapted from NeMo-Guardrails/tests/test_configs/with_kb_openai_embeddings/config.co
define user ask capabilities
  "What can you do?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"

define bot inform capabilities
  "I am an AI bot that helps answer questions about trulens."

define flow
  user ask capabilities
  bot inform capabilities
%%writefile config.co
# Adapted from NeMo-Guardrails/tests/test_configs/with_kb_openai_embeddings/config.co
define user ask capabilities
  "What can you do?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"

define bot inform capabilities
  "I am an AI bot that helps answer questions about trulens."

define flow
  user ask capabilities
  bot inform capabilities

Rails app instantiation¶

The instantiation of the app does not differ from the steps presented in NeMo.

In [ ]:

Copied!

from nemoguardrails import LLMRails
from nemoguardrails import RailsConfig

config = RailsConfig.from_path(".")
rails = LLMRails(config)
from nemoguardrails import LLMRails
from nemoguardrails import RailsConfig

config = RailsConfig.from_path(".")
rails = LLMRails(config)

In [ ]:

Copied!

assert (
    rails.kb is not None
), "Knowledge base not loaded. You might be using the wrong nemo release or branch."
assert (
    rails.kb is not None
), "Knowledge base not loaded. You might be using the wrong nemo release or branch."

Feedback functions setup¶

Lets consider some feedback functions. We will define two types: a simple language match that checks whether output of the app is in the same language as the input. The second is a set of three for evaluating context retrieval. The setup for these is similar to that for other app types such as langchain except we provide a utility RAG_triad to create the three context retrieval functions for you instead of having to create them separately.

In [ ]:

Copied!





from pprint import pprint

from trulens.core import Feedback
from trulens.core import Select
from trulens.feedback.feedback import rag_triad
from trulens.apps.nemo import TruRails
from trulens.providers.huggingface import Huggingface
from trulens.providers.openai import OpenAI

# Initialize provider classes
openai = OpenAI()
hugs = Huggingface()

# select context to be used in feedback. the location of context is app specific.

context = TruRails.select_context(rails)
question = Select.RecordInput
answer = Select.RecordOutput

f_language_match = (
    Feedback(hugs.language_match, if_exists=answer).on(question).on(answer)
)

fs_triad = rag_triad(
    provider=openai, question=question, answer=answer, context=context
)

# Overview of the 4 feedback functions defined.
pprint(f_language_match)
pprint(fs_triad)
from pprint import pprint

from trulens.core import Feedback
from trulens.core import Select
from trulens.feedback.feedback import rag_triad
from trulens.apps.nemo import TruRails
from trulens.providers.huggingface import Huggingface
from trulens.providers.openai import OpenAI

# Initialize provider classes
openai = OpenAI()
hugs = Huggingface()

# select context to be used in feedback. the location of context is app specific.

context = TruRails.select_context(rails)
question = Select.RecordInput
answer = Select.RecordOutput

f_language_match = (
    Feedback(hugs.language_match, if_exists=answer).on(question).on(answer)
)

fs_triad = rag_triad(
    provider=openai, question=question, answer=answer, context=context
)

# Overview of the 4 feedback functions defined.
pprint(f_language_match)
pprint(fs_triad)

`TruRails` recorder instantiation¶

Tru recorder construction is identical to other app types.

In [ ]:

Copied!





tru_rails = TruRails(
    rails,
    app_name="my first trurails app",  # optional
    feedbacks=[f_language_match, *fs_triad.values()],  # optional
)
tru_rails = TruRails(
    rails,
    app_name="my first trurails app",  # optional
    feedbacks=[f_language_match, *fs_triad.values()],  # optional
)

Logged app invocation¶

Using tru_rails as a context manager means the invocations of the rail app will be logged and feedback will be evaluated on the results.

In [ ]:

Copied!





with tru_rails as recorder:
    res = rails.generate(
        messages=[
            {
                "role": "user",
                "content": "Can I use AzureOpenAI to define a provider?",
            }
        ]
    )
    print(res["content"])
with tru_rails as recorder:
    res = rails.generate(
        messages=[
            {
                "role": "user",
                "content": "Can I use AzureOpenAI to define a provider?",
            }
        ]
    )
    print(res["content"])

Dashboard¶

You should be able to view the above invocation in the dashboard. It can be started with the following code.

In [ ]:

Copied!

from trulens.dashboard import run_dashboard

run_dashboard(session)
from trulens.dashboard import run_dashboard

run_dashboard(session)

Feedback retrieval¶

While feedback can be inspected on the dashboard, you can also retrieve its results in the notebook.

In [ ]:

Copied!





# Get the record from the above context manager.
record = recorder.get()

# Wait for the result futures to be completed and print them.
for feedback, result in record.wait_for_feedback_results().items():
    print(feedback.name, result.result)
# Get the record from the above context manager.
record = recorder.get()

# Wait for the result futures to be completed and print them.
for feedback, result in record.wait_for_feedback_results().items():
    print(feedback.name, result.result)

App testing with Feedback¶

Try out various other interactions to show off the capabilities of the feedback functions. For example, we can try to make the model answer in a different language than our prompt.

In [ ]:

Copied!





# Intended to produce low score on language match but seems random:
with tru_rails as recorder:
    res = rails.generate(
        messages=[
            {
                "role": "user",
                "content": "Please answer in Spanish: can I use AzureOpenAI to define a provider?",
            }
        ]
    )
    print(res["content"])

for feedback, result in recorder.get().wait_for_feedback_results().items():
    print(feedback.name, result.result)
# Intended to produce low score on language match but seems random:
with tru_rails as recorder:
    res = rails.generate(
        messages=[
            {
                "role": "user",
                "content": "Please answer in Spanish: can I use AzureOpenAI to define a provider?",
            }
        ]
    )
    print(res["content"])

for feedback, result in recorder.get().wait_for_feedback_results().items():
    print(feedback.name, result.result)

Monitoring and Evaluating NeMo Guardrails apps¶

Setup keys and trulens¶

Rails app setup¶

Rails app instantiation¶

Feedback functions setup¶

TruRails recorder instantiation¶

Logged app invocation¶

Dashboard¶

Feedback retrieval¶

App testing with Feedback¶

`TruRails` recorder instantiation¶