-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ai-content-moderation plugin #11541
feat: ai-content-moderation plugin #11541
Conversation
apisix/core/request.lua
Outdated
@@ -334,6 +335,26 @@ function _M.get_body(max_size, ctx) | |||
end | |||
|
|||
|
|||
function _M.get_body_table() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ai-proxy PR also has this code so later we can merge from master after ai-proxy is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that the name of the method there changed :D
|
||
The `ai-content-moderation` plugin processes the request body to check for toxicity and rejects the request if it exceeds the configured threshold. | ||
|
||
**_This plugin must be used in routes that proxy requests to LLMs only._** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just routes? no services?
Or do you just want to stress the upstream should be LLM providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on routes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do you just want to stress the upstream should be LLM providers.
this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I actually think this sentence is redundant. how about just mention It is used when integrating with LLMs.
in the paragraph above?
"ai-proxy": { | ||
"auth": { | ||
"header": { | ||
"Authorization": "Bearer token" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Authorization": "Bearer token" | |
"Authorization": "Bearer <your-api-token>" |
or
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
this takes env var
|
||
```shell | ||
curl http://127.0.0.1:9080/post -i -XPOST -H 'Content-Type: application/json' -d '{ | ||
"info": "<some very seriously profane message>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a dummy? Shouldn't the format to OpenAI be something like this?
"messages": [
{ "role": "system", "content": "system prompt goes here" },
{ "role": "user", "content": "offensive user prompts" }
]
request body exceeds toxicity threshold | ||
``` | ||
|
||
Send a request with normal request body: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send a request with normal request body: | |
Send a request with compliant content in the request body: |
Send a request with normal request body: | ||
|
||
```shell | ||
curl http://127.0.0.1:9080/post -i -XPOST -H 'Content-Type: application/json' -d 'APISIX is wonderful' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The opening paragraph says This plugin must be used in routes that proxy requests to LLMs only
yet the example does not involve proxying to LLM. It feels a bit self-conflicting.
The example demonstrates exactly the integration could be used for general purpose and checking requests NOT proxying to LLM.
| provider.aws_comprehend.secret_access_key | Yes | String | AWS secret access key | | ||
| provider.aws_comprehend.region | Yes | String | AWS region | | ||
| provider.aws_comprehend.endpoint | No | String | AWS Comprehend service endpoint. Must match the pattern `^https?://` | | ||
| moderation_categories | No | Object | Configuration for moderation categories. Must be one of: PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| moderation_categories | No | Object | Configuration for moderation categories. Must be one of: PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT | | |
| moderation_categories | No | Object | Key-value pairs of moderation category and their score. In each pair, the key should be one of the `PROFANITY`, `HATE_SPEECH`, `INSULT`, `HARASSMENT_OR_ABUSE`, `SEXUAL`, or `VIOLENCE_OR_THREAT`; and the value should be between 0 and 1 (inclusive). | |
| provider.aws_comprehend.region | Yes | String | AWS region | | ||
| provider.aws_comprehend.endpoint | No | String | AWS Comprehend service endpoint. Must match the pattern `^https?://` | | ||
| moderation_categories | No | Object | Configuration for moderation categories. Must be one of: PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT | | ||
| toxicity_level | No | Number | Threshold for overall toxicity detection. Range: 0 - 1. Default: 0.5 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| toxicity_level | No | Number | Threshold for overall toxicity detection. Range: 0 - 1. Default: 0.5 | | |
| toxicity_level | No | Number | The degree to which content is harmful, offensive, or inappropriate. A higher value indicates more toxic content allowed. Range: 0 - 1. Default: 0.5 | |
|
||
**_This plugin must be used in routes that proxy requests to LLMs only._** | ||
|
||
**_As of now only the AWS Comprehend service is supported for content moderation, PRs for introducing support for other service providers are welcome._** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**_As of now only the AWS Comprehend service is supported for content moderation, PRs for introducing support for other service providers are welcome._** | |
**_As of now, the plugin only supports the integration with [AWS Comprehend](https://aws.amazon.com/comprehend/) for content moderation. PRs for introducing support for other service providers are welcomed._** |
function _M.check_schema(conf) | ||
return core.schema.check(schema, conf) | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two blank lines between functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
type = "object", | ||
properties = { | ||
provider = { | ||
type = "object", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type = "object", | |
type = "object", | |
maxProperties = 1, |
To make sure next(conf.provider)
always returns aws_comprehend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return bad_request, "messages not found in request body" | ||
end | ||
|
||
local provider = conf.provider[next(conf.provider)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current schema definition does not seem to be able to prevent multiple properties from being entered incorrectly. It is recommended that you consider adding a maxProperties = 1
constraint to the schema.
LGTM |
Description
The
content-moderation
plugin processes the request body to check for toxicity and rejects the request if it exceeds the configured threshold.In later PRs, other plugins like ai-prompt-decorator and ai-prompt-template can use function from this plugin to ensure content moderation in requests proxying LLMs.
Checklist