Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secrets are reported on the wrong line #1876

Open
det opened this issue Oct 8, 2023 · 8 comments
Open

Secrets are reported on the wrong line #1876

det opened this issue Oct 8, 2023 · 8 comments
Labels

Comments

@det
Copy link

det commented Oct 8, 2023

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

TruffleHog Version

3.59.0 (and older versions)

Trace Output

https://gist.github.com/det/080c98039750a5296c6856efaaed8b5c

Expected Behavior

Secret should be reported on line 557

Actual Behavior

Secret is reported on line 287 (and a different wrong line number on older versions of trufflehog)

Steps to Reproduce

  1. wget https://gist.githubusercontent.com/det/1526b4c16d0e07ac023d75c912a68658/raw/c3061c14a811205a65cbdcf0065bd3c11d88bfcb/test.txt
  2. trufflehog filesystem test.txt
  3. The wrong line number is reported

Environment

  • OS: Linux

References

May be related to #1537

@det det added the bug label Oct 8, 2023
@det
Copy link
Author

det commented Oct 18, 2023

I just tested, and this problem persists even with #1891 merged.

@sxlijin
Copy link

sxlijin commented Oct 18, 2023

#1891 appears to be fixing an off-by-one error - not whatever's causing this.

@shreyas-sriram
Copy link
Contributor

This appears to be coming from the Chunker logic. A quick change of ChunkSize to 10 * 10 * 1024 returns the correct line number.

Found unverified result 🐷🔑❓
Detector Type: Github
Decoder Type: PLAIN
Raw result: ghs_012345678901234567890123456789012345
Rotation_guide: https://howtorotate.com/docs/tutorials/github/
File: a.txt
Line: 557

@sxlijin
Copy link

sxlijin commented Oct 19, 2023

If bumping ChunkSize from 10KiB to 100KiB fixes the issue, then that implies to me that:

  • trufflehog is not reporting the line number within the file, it is reporting the line number within a given chunk
  • bumping from 10KiB to 100KiB would only solve the problem for smaller files, and line numbers for secrets after the first 100KiB of a file will still be wrong

Also, presumably the Chunker was implemented for performance reasons (I'm guessing because there are so many detectors that are each running their own regex matching per chunk?) - what implications does bumping from 10KiB to 100KiB have for that?

@sxlijin
Copy link

sxlijin commented Oct 19, 2023

Here's another repro:

     1  // this block is xxxxxxxxxxxxxxxxx 1024KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     2  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     3  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     4  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     5  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     6  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     7  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     8  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     9  const token = "ghs_111111111111111111111111111111111111";
$ trufflehog-3.48.0 filesystem --json --fail --no-verification --no-update --exclude-detectors=PagerDutyApiKey,LaunchDarkly foo.ts
{"SourceMetadata":{"Data":{"Filesystem":{"file":"foo.ts","line":4}}},"SourceID":1,"SourceType":15,"SourceName":"trufflehog - filesystem","DetectorType":8,"DetectorName":"Github","DecoderName":"PLAIN","Verified":false,"Raw":"ghs_111111111111111111111111111111111111","RawV2":"","Redacted":"","ExtraData":null,"StructuredData":null}
{"SourceMetadata":{"Data":{"Filesystem":{"file":"foo.ts","line":8}}},"SourceID":1,"SourceType":15,"SourceName":"trufflehog - filesystem","DetectorType":8,"DetectorName":"Github","DecoderName":"PLAIN","Verified":false,"Raw":"ghs_111111111111111111111111111111111111","RawV2":"","Redacted":"","ExtraData":null,"StructuredData":null}

@bill-rich
Copy link
Collaborator

The reason for this is that filesystem doesn't do any special chunking. Git-based sources maintain line numbers through chunking in the git source which does its own line-aware chunking. We should add that logic to the general chunker. Or maybe to the util package so any source can utilize it.

@sxlijin
Copy link

sxlijin commented Nov 1, 2023

The reason for this is that filesystem doesn't do any special chunking

What is "this" in the context of your reply?

The question I'm looking to answer is "why does trufflehog filesystem reproducibly report the wrong file numbers in the described situations?" and so far the only answer suggested (that points the finger at chunking) doesn't make sense.

@Yullia
Copy link

Yullia commented May 7, 2024

Has the same issue via filesystem mode. Line number calculation is wrong. Version 3.75.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants