-
Notifications
You must be signed in to change notification settings - Fork 67
feat: update payload enrichment logic with input limits and filtering #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
with input limits and filtering Scale the number of inputs to process based on limit or max-size options to generate additional payloads. Signed-off-by: Dwi Siswanto <git@dw1.io>
WalkthroughThe changes update the payload enrichment function in the mutator. New variables are introduced to control the number of inputs processed and the extent of word and number extraction based on configurable limits. The method now slices the inputs array appropriately, filters words based on a minimum length, limits the number of extracted numbers, and deduplicates payload entries. Additionally, a debug log statement has been added to capture the count of words and numbers processed. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Enricher as enrichPayloads
participant Options
participant Logger
Caller->>Enricher: Invoke enrichPayloads with inputs and m.Options
Enricher->>Enricher: Check m.Options.Limit and m.Options.MaxSize
Enricher->>Enricher: Slice inputs based on limit (maxInputsToProcess)
Enricher->>Enricher: Filter words (min length 3) & limit extraction of numbers
Enricher->>Options: Update m.Options.Payloads with deduplicated words and numbers
Enricher->>Logger: Emit debug log with counts of words and numbers added
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
mutator.go (3)
305-309: Consider clarifying the MaxSize condition logicThe condition only applies MaxSize if it's less than or equal to the number of inputs. This means larger MaxSize values won't be applied, which might not be intuitive.
Consider revising the condition to apply MaxSize whenever it's greater than zero:
-if m.Options.MaxSize > 0 && m.Options.MaxSize <= len(inputs) { +if m.Options.MaxSize > 0 { maxInputsToProcess = m.Options.MaxSize maxWordsToExtract = m.Options.MaxSize maxNumbersToExtract = m.Options.MaxSize }
335-337: Ensure number extraction limits handle zero value correctlyThe implementation correctly limits the number of extracted numbers, but should verify that maxNumbersToExtract is greater than zero before slicing.
Consider adding an additional check to ensure maxNumbersToExtract is greater than zero:
-if len(numbers) > maxNumbersToExtract && maxNumbersToExtract > 0 { +if maxNumbersToExtract > 0 && len(numbers) > maxNumbersToExtract { numbers = numbers[:maxNumbersToExtract] }
344-346: Similar consideration for word extraction limitsThe implementation has the same pattern as number extraction. Consider consistent ordering of conditions.
-if len(extraWords) > maxWordsToExtract && maxWordsToExtract > 0 { +if maxWordsToExtract > 0 && len(extraWords) > maxWordsToExtract { extraWords = extraWords[:maxWordsToExtract] }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
mutator.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Lint Test
🔇 Additional comments (7)
mutator.go (7)
292-295: Good addition of scaling variables for input processing controlThese new variables provide fine-grained control over the enrichment process, which aligns well with the PR objective of implementing input limits.
298-304: Proper implementation of scaling based on limit optionThe implementation correctly scales the number of inputs to process based on the limit option, which should improve performance when generating additional payloads.
311-313: Efficient slicing of inputs based on processing limitsThis slicing operation optimizes performance by limiting the inputs to process, which is especially valuable when dealing with large input sets.
326-334: Good implementation of word filtering with minimum lengthThe filtering of words based on minimum length is a good practice to improve the quality of enriched payloads by excluding very short, potentially meaningless words.
348-353: Robust handling of word payloadsThe revised approach properly handles both existing and new words, ensuring that existing values are not lost but enhanced with the new values.
355-360: Consistent implementation for number payloadsThe same logical structure is correctly applied to the number payloads, maintaining consistency in the implementation.
362-363: Valuable debug logging additionThe debug log statement is a good practice as it provides visibility into the enrichment process, which will be helpful for troubleshooting and understanding the behavior of the code.
tarunKoyalwar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to share some thoughts on this approach:
What this PR does well:
- Reduces the number of inputs processed during enrichment phase
- Adds filtering to limit extracted words/numbers
- Code is clean and well-commented
My concerns (IMO):
I'm not entirely sure this will provide noticeable performance
improvements in practice. The issue is that --limit typically refers to
the output permutations (which can be millions), while the number of
inputs is usually much smaller (hundreds to thousands). For example:
- Input: 1000 domains
- Limit: 350,000 permutations
- This PR would still process all 1000 inputs during enrichment
So the scaling still happens in the permutation phase, where we're not
addressing the early exit.
Alternative approach :
Looking at the codebase, I think the root issue is architectural - we have
an async channel-based design that requires draining (as noted in the
ExecuteWithWriter comment).
What if we modified the Execute() goroutine and clusterBomb() function to
accept a context? Then in ExecuteWithWriter(), we could cancel the context
once we hit the limit. This would stop generating new permutations
without needing to drain everything.
Something like:
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// In ExecuteWithWriter loop:
if limitReached {
cancel() // Stops permutation generation
}
Recommendation:
Given that this doesn't fully address the performance issue reported in
#270, I'd suggest we close this PR and open a new one with the
context-based approach. But I'm definitely open to other perspectives -
what do you think?
Scale the number of inputs to process based on
limit or max-size options to generate additional
payloads.
Prolly fixes #270
Summary by CodeRabbit