Skip to content

Log guardrail activation for Relevance and Jailbreak #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vvilella
Copy link

Enhancement: Log Guardrail Activation for Relevance and Jailbreak

Description

This PR introduces explicit logging when either the Relevance Guardrail or the Jailbreak Guardrail is triggered. The log messages help developers and operators observe when input validation guardrails are tripped during runtime, which can be helpful for debugging and audit purposes.

Changes

  • Added print statements to relevance_guardrail and jailbreak_guardrail functions in main.py
  • Logs the tripwire reason and whether it was triggered

Example Log Output

[Guardrail] Relevance guardrail triggered: is_relevant=False, reason='Message unrelated to airline services'
[Guardrail] Jailbreak guardrail triggered: is_safe=False, reason='User attempted to access system prompt'

Motivation

While the UI visually indicates when a guardrail is triggered, the backend runtime had no explicit logging, which makes backend-only testing and server-side monitoring more difficult.

This addition brings simple, useful observability with no performance impact.

Notes

  • This implementation uses simple print() logging for consistency with the rest of the file.
  • If the maintainers prefer structured or configurable logging (e.g., using Python's logging module), I’d be happy to update the PR accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant