Skip to content

feat(ask-a-sailor): add any-guardrail content safety layer#27

Draft
Copilot wants to merge 2 commits into
mainfrom
copilot/add-any-guardrail-for-safety
Draft

feat(ask-a-sailor): add any-guardrail content safety layer#27
Copilot wants to merge 2 commits into
mainfrom
copilot/add-any-guardrail-for-safety

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 11, 2026

Ask a Sailor serves families with children 6–18. This adds mozilla-ai/any-guardrail as a content safety layer that checks every user message before RAG and every response before return.

Changes

  • src/safety/guardrail.py — Two classes:
    • ContentGuardrail: wraps any_guardrail.AnyGuardrail, lazy-imports GPU deps only when a real backend is selected
    • GuardedAgent: drop-in wrapper around AskASailorAgent.answer() with input/output safety checks and a safe fallback
  • GUARDRAIL_BACKEND env varllama_guard | shield_gemma | none (default, passthrough for CI)
  • requirements.txt (root + ask-a-sailor) — added any-guardrail>=0.3.0
  • tests/test_guardrail.py — 16 tests, fully mocked, no GPU needed

Usage

from safety.guardrail import GuardedAgent

agent = GuardedAgent(corpus_dir=Path("corpus"), club_filter="lyc")
result = agent.answer("How much does Opti Camp cost?")

# If input or output is flagged:
# result["answer"] → safe fallback with contact info
# result["guardrail_blocked"] → "input" | "output"

Set GUARDRAIL_BACKEND=llama_guard in production; omit or set to none for CI/local dev.

Original prompt

This section details on the original issue you should resolve

<issue_title>[ask-a-sailor] Add mozilla-ai/any-guardrail for youth content safety</issue_title>
<issue_description>Ask a Sailor serves families with children 6-18. Needs content safety guardrails beyond prompt engineering. mozilla-ai/any-guardrail wraps Llama Guard and ShieldGemma in one interface. Work: (1) add any-guardrail to requirements.txt, (2) create src/safety/guardrail.py wrapping AskASailorAgent.answer() — check every user message before RAG, every response before return, (3) configurable via GUARDRAIL_BACKEND env var (llama_guard|shield_gemma|none for CI), (4) tests with mocked guardrail. Acceptance: known-bad inputs blocked with safe fallback, CI passes without GPU. Ref: github.com/mozilla-ai/any-guardrail</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

- Add any-guardrail>=0.3.0 to root and ask-a-sailor requirements.txt
- Create src/safety/guardrail.py with ContentGuardrail and GuardedAgent
- GUARDRAIL_BACKEND env var: llama_guard | shield_gemma | none (CI)
- GuardedAgent wraps AskASailorAgent.answer() with input+output checks
- Safe fallback message when content is blocked
- 16 tests with mocked guardrail backends (no GPU needed)

Co-authored-by: fullharbor <237832340+fullharbor@users.noreply.github.com>
Copilot AI changed the title [WIP] Add mozilla-ai/any-guardrail for youth content safety feat(ask-a-sailor): add any-guardrail content safety layer Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ask-a-sailor] Add mozilla-ai/any-guardrail for youth content safety

2 participants