Bugfix/prompt security responses #17055

davida-ps · 2025-11-25T00:03:14Z

## Title

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
✅ Test

Changes

pleas align this text

This pull request significantly improves the Prompt Security guardrail integration in prompt_security.py by introducing robust logging, clearer exception handling, more granular control over when guardrails are applied, and better support for complex response types. The changes also refactor how messages and outputs are sanitized, filtered, and updated after guardrail intervention. These improvements enhance reliability, observability, and maintainability of the guardrail logic.

Key changes include:

1. Exception Handling and Logging Enhancements

Introduced custom exceptions (PromptSecurityGuardrailAPIError, PromptSecurityBlockedMessage) for clearer error handling and more informative HTTP responses when content is blocked or API errors occur.
Added comprehensive logging of guardrail actions and failures, including timing, status, and details of each guardrail invocation, to facilitate debugging and monitoring. [1] [2]

2. Guardrail Hook Improvements and Control Flow

Updated all guardrail hooks (async_pre_call_hook, async_moderation_hook, async_post_call_success_hook, and streaming iterator hook) to ensure metadata is present, check if the guardrail should run for each event type, and consistently update applied guardrail headers. [1] [2] [3]
Improved input sanitization and message transformation, including a workaround to filter out system-generated metadata before sending messages to the Prompt Security API.

3. Output and Message Handling for Advanced Response Types

Added logic to handle and update messages for complex response types (such as ResponsesAPIResponse), including helper methods to normalize and update messages and instructions after guardrail intervention.
Implemented _scan_responses_api_output to scan and potentially modify or block outputs in batch response APIs, ensuring consistent guardrail enforcement.

4. Refactoring and Code Quality

Refactored code for clarity and maintainability, such as extracting message normalization and update logic into helper methods, and improving exception handling structure throughout. [1] [2]

5. Minor Cleanup

Removed an unused comment from the XAI Responses API tests.

…rror handling, and streamline test cases for better maintainability and clarity.

vercel · 2025-11-25T00:03:18Z

@davida-ps is attempting to deploy a commit to the CLERKIEAI Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2025-11-25T00:03:21Z

All committers have signed the CLA.

litellm/proxy/guardrails/guardrail_hooks/prompt_security/prompt_security.py

…lect change

…/davida-ps/litellm into bugfix/prompt-security-responses

davida-ps added 3 commits November 25, 2025 00:12

mistakenly pushed file from old main. reverting

323f3e4

this is already in assets/logos - mistakenly placed here

4e015bd

Refactor Prompt Security Guardrail Tests: Enhance fixtures, improve e…

bbdfa69

…rror handling, and streamline test cases for better maintainability and clarity.

davida-ps and others added 3 commits November 25, 2025 02:11

linted

64884a6

Merge branch 'BerriAI:main' into bugfix/prompt-security-responses

67e8671

mypy-fix

d5e6a36

davida-ps mentioned this pull request Nov 25, 2025

Build UI prompt security #17058

Closed

krrishdholakia reviewed Nov 25, 2025

View reviewed changes

litellm/proxy/guardrails/guardrail_hooks/prompt_security/prompt_security.py Outdated Show resolved Hide resolved

davida-ps and others added 4 commits November 25, 2025 11:47

apply_guardrail utilized to catch all endpoints - update tests to ref…

7d36fc9

…lect change

Merge branch 'BerriAI:main' into bugfix/prompt-security-responses

2096723

lint

25a5811

Merge branch 'bugfix/prompt-security-responses' of https://github.com…

6f82ad5

…/davida-ps/litellm into bugfix/prompt-security-responses

davida-ps requested a review from krrishdholakia November 25, 2025 11:08

clear old methods

8dfb931

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bugfix/prompt security responses #17055

Bugfix/prompt security responses #17055

davida-ps commented Nov 25, 2025

Uh oh!

vercel bot commented Nov 25, 2025

Uh oh!

CLAassistant commented Nov 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Bugfix/prompt security responses #17055

Are you sure you want to change the base?

Bugfix/prompt security responses #17055

Conversation

davida-ps commented Nov 25, 2025

Relevant issues

Pre-Submission checklist

Type

Changes

Uh oh!

vercel bot commented Nov 25, 2025

Uh oh!

CLAassistant commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Nov 25, 2025 •

edited

Loading