feat: update payload enrichment logic with input limits and filtering #273

dwisiswant0 · 2025-03-15T04:56:33Z

Scale the number of inputs to process based on
limit or max-size options to generate additional
payloads.

Prolly fixes #270

Summary by CodeRabbit

New Features
- Enhanced data processing to more effectively control the amount of information extracted.
- Improved filtering to retain only substantial content and remove duplicates from the results.

with input limits and filtering Scale the number of inputs to process based on limit or max-size options to generate additional payloads. Signed-off-by: Dwi Siswanto <git@dw1.io>

coderabbitai · 2025-03-15T04:56:40Z

Walkthrough

The changes update the payload enrichment function in the mutator. New variables are introduced to control the number of inputs processed and the extent of word and number extraction based on configurable limits. The method now slices the inputs array appropriately, filters words based on a minimum length, limits the number of extracted numbers, and deduplicates payload entries. Additionally, a debug log statement has been added to capture the count of words and numbers processed.

Changes

File	Change Summary
mutator.go	Modified the `enrichPayloads` method to introduce variables controlling input, word, and number extraction. Added slicing of inputs, filtering words (min 3 chars), truncating number lists, deduplicating payload entries, and a debug log for processing counts.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Enricher as enrichPayloads
    participant Options
    participant Logger

    Caller->>Enricher: Invoke enrichPayloads with inputs and m.Options
    Enricher->>Enricher: Check m.Options.Limit and m.Options.MaxSize
    Enricher->>Enricher: Slice inputs based on limit (maxInputsToProcess)
    Enricher->>Enricher: Filter words (min length 3) & limit extraction of numbers
    Enricher->>Options: Update m.Options.Payloads with deduplicated words and numbers
    Enricher->>Logger: Emit debug log with counts of words and numbers added

Poem

I hopped through lines of clever code,
Setting limits on each payload load.
Words and numbers neatly refined,
In my burrow, perfection I find.
With a debug log to cheer the day,
Hoppy coding keeps bugs at bay!
🐰🐇

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)

We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
- To enable this feature, set early_access to true under in the settings.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

mutator.go (3)
305-309: Consider clarifying the MaxSize condition logic

The condition only applies MaxSize if it's less than or equal to the number of inputs. This means larger MaxSize values won't be applied, which might not be intuitive.

Consider revising the condition to apply MaxSize whenever it's greater than zero:
-if m.Options.MaxSize > 0 && m.Options.MaxSize <= len(inputs) {
+if m.Options.MaxSize > 0 {
    maxInputsToProcess = m.Options.MaxSize
    maxWordsToExtract = m.Options.MaxSize
    maxNumbersToExtract = m.Options.MaxSize
}
335-337: Ensure number extraction limits handle zero value correctly

The implementation correctly limits the number of extracted numbers, but should verify that maxNumbersToExtract is greater than zero before slicing.

Consider adding an additional check to ensure maxNumbersToExtract is greater than zero:
-if len(numbers) > maxNumbersToExtract && maxNumbersToExtract > 0 {
+if maxNumbersToExtract > 0 && len(numbers) > maxNumbersToExtract {
    numbers = numbers[:maxNumbersToExtract]
}
344-346: Similar consideration for word extraction limits

The implementation has the same pattern as number extraction. Consider consistent ordering of conditions.
-if len(extraWords) > maxWordsToExtract && maxWordsToExtract > 0 {
+if maxWordsToExtract > 0 && len(extraWords) > maxWordsToExtract {
    extraWords = extraWords[:maxWordsToExtract]
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65a13bb and f9ec0d3.

📒 Files selected for processing (1)

mutator.go (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Lint Test

🔇 Additional comments (7)

mutator.go (7)

292-295: Good addition of scaling variables for input processing control

These new variables provide fine-grained control over the enrichment process, which aligns well with the PR objective of implementing input limits.

298-304: Proper implementation of scaling based on limit option

The implementation correctly scales the number of inputs to process based on the limit option, which should improve performance when generating additional payloads.

311-313: Efficient slicing of inputs based on processing limits

This slicing operation optimizes performance by limiting the inputs to process, which is especially valuable when dealing with large input sets.

326-334: Good implementation of word filtering with minimum length

The filtering of words based on minimum length is a good practice to improve the quality of enriched payloads by excluding very short, potentially meaningless words.

348-353: Robust handling of word payloads

The revised approach properly handles both existing and new words, ensuring that existing values are not lost but enhanced with the new values.

355-360: Consistent implementation for number payloads

The same logical structure is correctly applied to the number payloads, maintaining consistency in the implementation.

362-363: Valuable debug logging addition

The debug log statement is a good practice as it provides visibility into the enrichment process, which will be helpful for troubleshooting and understanding the behavior of the code.

tarunKoyalwar

I wanted to share some thoughts on this approach:

What this PR does well:

Reduces the number of inputs processed during enrichment phase
Adds filtering to limit extracted words/numbers
Code is clean and well-commented

My concerns (IMO):
I'm not entirely sure this will provide noticeable performance
improvements in practice. The issue is that --limit typically refers to
the output permutations (which can be millions), while the number of
inputs is usually much smaller (hundreds to thousands). For example:

Input: 1000 domains
Limit: 350,000 permutations
This PR would still process all 1000 inputs during enrichment

So the scaling still happens in the permutation phase, where we're not
addressing the early exit.

Alternative approach :
Looking at the codebase, I think the root issue is architectural - we have
an async channel-based design that requires draining (as noted in the
ExecuteWithWriter comment).

What if we modified the Execute() goroutine and clusterBomb() function to
accept a context? Then in ExecuteWithWriter(), we could cancel the context
once we hit the limit. This would stop generating new permutations
without needing to drain everything.

Something like:
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

// In ExecuteWithWriter loop:
if limitReached {
cancel() // Stops permutation generation
}

Recommendation:
Given that this doesn't fully address the performance issue reported in
#270, I'd suggest we close this PR and open a new one with the
context-based approach. But I'm definitely open to other perspectives -
what do you think?

feat: update payload enrichment logic

f9ec0d3

with input limits and filtering Scale the number of inputs to process based on limit or max-size options to generate additional payloads. Signed-off-by: Dwi Siswanto <git@dw1.io>

dwisiswant0 requested review from dogancanbakir and ehsandeep March 15, 2025 04:56

coderabbitai bot reviewed Mar 15, 2025

View reviewed changes

dogancanbakir requested a review from tarunKoyalwar March 28, 2025 13:54

dogancanbakir approved these changes Mar 28, 2025

View reviewed changes

tarunKoyalwar reviewed Oct 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: update payload enrichment logic with input limits and filtering #273

feat: update payload enrichment logic with input limits and filtering #273

Uh oh!

dwisiswant0 commented Mar 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 15, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

tarunKoyalwar left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: update payload enrichment logic with input limits and filtering #273

Are you sure you want to change the base?

feat: update payload enrichment logic with input limits and filtering #273

Uh oh!

Conversation

dwisiswant0 commented Mar 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tarunKoyalwar left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dwisiswant0 commented Mar 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 15, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)