Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move detectors.IsKnownFalsePositive from the detectors and into the engine #2643

Merged
merged 8 commits into from
Apr 22, 2024

Conversation

dustin-decker
Copy link
Contributor

@dustin-decker dustin-decker commented Mar 29, 2024

Description:

Move detectors.IsKnownFalsePositive from the detectors and into the engine.

Refactor-robot was used but result was reviewed and altered in a few cases.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@rgmz
Copy link
Contributor

rgmz commented Mar 29, 2024

Imo, it would be ideal for this to still be configurable to some degree. Certain secret patterns may not play nicely with the default list and may even have their own separate list (e.g., #1953.)

@dustin-decker
Copy link
Contributor Author

We're working on centralizing the filtering and can make it more configurable in the future. I have left the detectors that I saw that had custom lists in place.

The wordlists were also filtered to 4 char minimum a while back so that resolved the case you identified in #1953, but I recognize that it's just of an example and still can occur in other cases.

@dustin-decker dustin-decker marked this pull request as ready for review March 29, 2024 03:36
@dustin-decker dustin-decker requested review from a team as code owners March 29, 2024 03:36
Comment on lines 142 to 144
if !IsKnownFalsePositive(string(result.Raw), falsePositives, wordCheck) {
filteredResults = append(filteredResults, result)
}
Copy link
Contributor

@rgmz rgmz Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be useful to trace log that something was filtered? There have been a few times — most recently #2620 — where the cause of something not being detected wasn't immediately obvious, and required custom logging to see that the FP check was responsible.

(Incidentally, there's a lot of noise from this specific log)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgmz in internal discussions we've actually been leaning towards expanding --results to include these in order to solve that problem. We're not quite there yet but it seems like a potentially good way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that. It would be an intuitive expansion of the flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extended the results flag with an additional filtered_unverified option. It just logs that it was omitted, it doesn't actually 'notify' on it though.

switch result.DetectorType {
case detectorspb.DetectorType_CustomRegex:
filteredResults = append(filteredResults, result)
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we breaking for these detector types? My understanding is that the effect will be that when a custom regex (or gcp) detector returns multiple unverified results, we'll ignore all but the first one. Is that intended? And even if it is, is this function the place for it? (It doesn't sound like it has anything to do with false positives.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lint action is actually flagging this as a "redundant break" because it is breaking out of the switch, not the for.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 thanks for correcting me!

Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I went through the detector changes and I made this list of detectors that had their false positive check expanded, i.e. these detectors now perform wordlist-based false positive checking on material that they did not before:

  • azurebatch used to filter on account key, but now filters on endpoint
  • azurecontainerregistry used to filter on password, but now filters on endpoint
  • column used to filter on nothing, but now filters on Raw
  • facebookoauth used to filter on api id, but now filters on api secret
  • githubapp used to filter on nothing, but now filters on private key
  • paypaloauth used to filter on key id, but now filters on key secret
  • postgres used to filter on the password only, but now filters on the entire connection string
  • pusherchannelkey used to filter on the secret, but now filters on the app id
  • shopify used to filter on the key, but now filters on the key+domain
  • trufflehogenterprise used to filter on the secret, but now filters on the key

I don't think that any of these are necessarily wrong but we should ensure that they're all acceptable. (E.g. shopify now runs word lists against domains.)

It also looks like the url detector lost all of its custom logic (specifically it no longer checks the allowKnownTestSites flag, and no longer skips the word list). This seems less likely to have been intentional.

@dustin-decker
Copy link
Contributor Author

Thank you for your analysis, Cody.

I've made the following changes, the rest seemed okay to me:

[x] azurebatch used to filter on account key, but now filters on endpoint - excluded
[x] azurecontainerregistry used to filter on password, but now filters on endpoint - excluded
[x] postgres used to filter on the password only, but now filters on the entire connection string - excluded, pattern is specific
[x] shopify used to filter on the key, but now filters on the key+domain - excluded, pattern is specific
[x] uri lost all of its custom logic (specifically it no longer checks the allowKnownTestSites flag, and no longer skips the word list). This seems less likely to have been intentional - excluded, it does still use the allowKnownTestSites flag

I've also extended the results flag with an additional filtered_unverified option. It just logs that it was omitted, it doesn't actually 'notify' on it though.

Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! I'm happy to have been wrong about the work level. I noticed that my previous detector list was missing one: dotmailer used to run the FP check against the password, but now it runs it against the key ID, which looks like an email address. That might be another one to exclude.

It also looks like you've got some new lint errors, but nothing show-stopping. Thanks for doing this!

@dustin-decker dustin-decker merged commit 14e44db into main Apr 22, 2024
9 of 10 checks passed
@dustin-decker dustin-decker deleted the remove-fp-filtering branch April 22, 2024 22:18
rosecodym added a commit that referenced this pull request Apr 23, 2024
This PR adds false positive information to the Result protobuf message in anticipation of us tracking it as first-class secret metadata. We're not doing that yet (it's blocked behind #2643) but setting up the messages now means we'll be able to do it later with less of a code delta.
@rgmz rgmz mentioned this pull request Apr 24, 2024
2 tasks
rosecodym added a commit that referenced this pull request Apr 30, 2024
This PR:

Creates an optional interface that detectors can use to customize their false positive detection
Implements this interface on detectors that have custom logic
In most cases this "custom logic" is simply a no-op because the detector does not participate in false positive detection
Eliminates inline (old-style) false positive exclusion in a few detectors that #2643 missed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants