Skip to content

Conversation

@dwisiswant0
Copy link
Member

@dwisiswant0 dwisiswant0 commented Mar 15, 2025

Scale the number of inputs to process based on
limit or max-size options to generate additional
payloads.

Prolly fixes #270

Summary by CodeRabbit

  • New Features
    • Enhanced data processing to more effectively control the amount of information extracted.
    • Improved filtering to retain only substantial content and remove duplicates from the results.

with input limits and filtering

Scale the number of inputs to process based on
limit or max-size options to generate additional
payloads.

Signed-off-by: Dwi Siswanto <git@dw1.io>
@coderabbitai
Copy link

coderabbitai bot commented Mar 15, 2025

Walkthrough

The changes update the payload enrichment function in the mutator. New variables are introduced to control the number of inputs processed and the extent of word and number extraction based on configurable limits. The method now slices the inputs array appropriately, filters words based on a minimum length, limits the number of extracted numbers, and deduplicates payload entries. Additionally, a debug log statement has been added to capture the count of words and numbers processed.

Changes

File Change Summary
mutator.go Modified the enrichPayloads method to introduce variables controlling input, word, and number extraction. Added slicing of inputs, filtering words (min 3 chars), truncating number lists, deduplicating payload entries, and a debug log for processing counts.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Enricher as enrichPayloads
    participant Options
    participant Logger

    Caller->>Enricher: Invoke enrichPayloads with inputs and m.Options
    Enricher->>Enricher: Check m.Options.Limit and m.Options.MaxSize
    Enricher->>Enricher: Slice inputs based on limit (maxInputsToProcess)
    Enricher->>Enricher: Filter words (min length 3) & limit extraction of numbers
    Enricher->>Options: Update m.Options.Payloads with deduplicated words and numbers
    Enricher->>Logger: Emit debug log with counts of words and numbers added
Loading

Poem

I hopped through lines of clever code,
Setting limits on each payload load.
Words and numbers neatly refined,
In my burrow, perfection I find.
With a debug log to cheer the day,
Hoppy coding keeps bugs at bay!
🐰🐇

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
mutator.go (3)

305-309: Consider clarifying the MaxSize condition logic

The condition only applies MaxSize if it's less than or equal to the number of inputs. This means larger MaxSize values won't be applied, which might not be intuitive.

Consider revising the condition to apply MaxSize whenever it's greater than zero:

-if m.Options.MaxSize > 0 && m.Options.MaxSize <= len(inputs) {
+if m.Options.MaxSize > 0 {
    maxInputsToProcess = m.Options.MaxSize
    maxWordsToExtract = m.Options.MaxSize
    maxNumbersToExtract = m.Options.MaxSize
}

335-337: Ensure number extraction limits handle zero value correctly

The implementation correctly limits the number of extracted numbers, but should verify that maxNumbersToExtract is greater than zero before slicing.

Consider adding an additional check to ensure maxNumbersToExtract is greater than zero:

-if len(numbers) > maxNumbersToExtract && maxNumbersToExtract > 0 {
+if maxNumbersToExtract > 0 && len(numbers) > maxNumbersToExtract {
    numbers = numbers[:maxNumbersToExtract]
}

344-346: Similar consideration for word extraction limits

The implementation has the same pattern as number extraction. Consider consistent ordering of conditions.

-if len(extraWords) > maxWordsToExtract && maxWordsToExtract > 0 {
+if maxWordsToExtract > 0 && len(extraWords) > maxWordsToExtract {
    extraWords = extraWords[:maxWordsToExtract]
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65a13bb and f9ec0d3.

📒 Files selected for processing (1)
  • mutator.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Lint Test
🔇 Additional comments (7)
mutator.go (7)

292-295: Good addition of scaling variables for input processing control

These new variables provide fine-grained control over the enrichment process, which aligns well with the PR objective of implementing input limits.


298-304: Proper implementation of scaling based on limit option

The implementation correctly scales the number of inputs to process based on the limit option, which should improve performance when generating additional payloads.


311-313: Efficient slicing of inputs based on processing limits

This slicing operation optimizes performance by limiting the inputs to process, which is especially valuable when dealing with large input sets.


326-334: Good implementation of word filtering with minimum length

The filtering of words based on minimum length is a good practice to improve the quality of enriched payloads by excluding very short, potentially meaningless words.


348-353: Robust handling of word payloads

The revised approach properly handles both existing and new words, ensuring that existing values are not lost but enhanced with the new values.


355-360: Consistent implementation for number payloads

The same logical structure is correctly applied to the number payloads, maintaining consistency in the implementation.


362-363: Valuable debug logging addition

The debug log statement is a good practice as it provides visibility into the enrichment process, which will be helpful for troubleshooting and understanding the behavior of the code.

Copy link
Member

@tarunKoyalwar tarunKoyalwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to share some thoughts on this approach:

What this PR does well:

  • Reduces the number of inputs processed during enrichment phase
  • Adds filtering to limit extracted words/numbers
  • Code is clean and well-commented

My concerns (IMO):
I'm not entirely sure this will provide noticeable performance
improvements in practice. The issue is that --limit typically refers to
the output permutations (which can be millions), while the number of
inputs is usually much smaller (hundreds to thousands). For example:

  • Input: 1000 domains
  • Limit: 350,000 permutations
  • This PR would still process all 1000 inputs during enrichment

So the scaling still happens in the permutation phase, where we're not
addressing the early exit.

Alternative approach :
Looking at the codebase, I think the root issue is architectural - we have
an async channel-based design that requires draining (as noted in the
ExecuteWithWriter comment).

What if we modified the Execute() goroutine and clusterBomb() function to
accept a context? Then in ExecuteWithWriter(), we could cancel the context
once we hit the limit. This would stop generating new permutations
without needing to drain everything.

Something like:
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

// In ExecuteWithWriter loop:
if limitReached {
cancel() // Stops permutation generation
}

Recommendation:
Given that this doesn't fully address the performance issue reported in
#270, I'd suggest we close this PR and open a new one with the
context-based approach. But I'm definitely open to other perspectives -
what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding a limit does not improve processing time

4 participants