-
-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add spam detection #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes include updates to the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant WebhookHandler
participant Validator
participant Normalizer
participant ResponseHandler
Client->>WebhookHandler: Sends GitHub webhook event
WebhookHandler->>Validator: Validate incoming data
Validator-->>WebhookHandler: Validated data
WebhookHandler->>Normalizer: Normalize issue content
Normalizer-->>WebhookHandler: Normalized content
WebhookHandler->>ResponseHandler: Process and respond to event
ResponseHandler-->>Client: Send response
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
Note 🎁 Summarized by CodeRabbit FreeYour organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
# Conflicts: # package.json # pnpm-lock.yaml
Deploying carpenter-uh62 with
|
| Latest commit: |
39816c5
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://dc7877ed.carpenter-uh62.pages.dev |
| Branch Preview URL: | https://feat-spam-detection.carpenter-uh62.pages.dev |
| enum IssueLabel { | ||
| NeedsReproduction = 'needs reproduction', | ||
| PossibleRegression = 'possible regression', | ||
| Nitro = 'nitro', | ||
| Documentation = 'documentation', | ||
| PossibleSpam = 'spam', | ||
| } | ||
|
|
||
| enum IssueType { | ||
| Bug = 'bug', | ||
| Feature = 'feature', | ||
| Documentation = 'documentation', | ||
| Spam = 'spam', | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a massive fan of enums but willing to give it a go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? 👀
| })).optional(), | ||
| }) | ||
|
|
||
| // TODO: generate AI model schema from this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe do it the other way - generate the zod schema from the json schema above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, but I'm not sure how we would create the transforms then. 🤔 They're there to provide some recovery from a non-optimal AI response.
Sometimes, it would respond with null for the boolean fields and also invalid language codes, which would then cause an error in the translation prompt.
I liked the idea of transforming the data using zod so that we don't have to think about it further down in the code.
|
now I think I just need to check that the token carpenter has is able to transfer issues ... it probably only has permissions right now to read/write issues ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful work - thank you ❤️
now I think it's time to try in production
This PR brings a small refactor to the original code and implements spam detection.
It was necessary to downgrade nuxt hub to
0.7.26because of an issue explained here: #32I optimized how the issue body is sent to the AI model - previously, the AI wouldn't process anything if the issue had more than 200 words. Now, unnecessary information like the environment, logs, etc., is stripped away, and the total content length is capped at
5000characters. (This might be too much, not sure)I've tested it on a couple of issues, and it seems to work reasonably well.
One potential enhancement to optimize AI requests could be to store the number of spam issues created by a certain user in a KV. If that count hits a certain threshold within 10 minutes or so, we could automatically consider every subsequent issue as spam without sending the contents to the LLM (with an expiry date, of course).
Btw. Would it be worth maybe generating the json schema from the zod schema using something like https://www.npmjs.com/package/zod-to-json-schema, for example?
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores