Skip to content

feat: prevent uploading PyPI tokens in common places#19994

Open
kam193 wants to merge 2 commits into
pypi:mainfrom
kam193:prevent_token_uploading
Open

feat: prevent uploading PyPI tokens in common places#19994
kam193 wants to merge 2 commits into
pypi:mainfrom
kam193:prevent_token_uploading

Conversation

@kam193
Copy link
Copy Markdown

@kam193 kam193 commented May 1, 2026

The minimal implementation of #19985 to prevent users from uploading tokens in distribution archives. As the current scanner requires selecting extensions to scan, the list was extended with common places where they are left behind.

This change does not perform automatic token revoking. Only tokens for pypi.org are detected.

Minimal implementaiton detecting PyPI API tokens in the commonly
used places using the Yara scanner.
@kam193 kam193 force-pushed the prevent_token_uploading branch from 19f8bf3 to 530edc6 Compare May 1, 2026 19:14
@miketheman miketheman added security Security-related issues and pull requests tokens Issues relating to API tokens labels May 1, 2026
Copy link
Copy Markdown
Member

@miketheman miketheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent start, thanks for taking a pass at it! I've dropped some comments inline.

Comment thread warehouse/utils/scanner_rules/pypi_token.yar Outdated
Comment thread warehouse/utils/scanner.py Outdated
Comment thread warehouse/utils/scanner_rules/pypi_token.yar Outdated
return compiled


def _generate_token(domain="pypi.org", projects_scope=False):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine a future factory-style generator to create tokens for testing, so this paves that future nicely.

Comment thread tests/unit/utils/test_scanner.py Outdated
@kam193 kam193 marked this pull request as ready for review May 3, 2026 22:07
@kam193 kam193 requested a review from a team as a code owner May 3, 2026 22:07
@kam193 kam193 force-pushed the prevent_token_uploading branch from e386b2b to e210ba3 Compare May 3, 2026 22:09
@kam193
Copy link
Copy Markdown
Author

kam193 commented May 3, 2026

From looking for test tokens to exclude, the closest seem to be https://github.com/mongodb/kingfisher/blob/b2811107a814c117f6a54fd8d334aef52409bf8f/src/validation.rs#L1924 or https://github.com/cider-security-research/cicd-goat/blob/0ed10925f3983857cf219b2ac1c327b861fcccca/solutions/duchess.md?plain=1#L35 but the first won't land in any scanned file, and the second is not a PyPI project.

However, I've discovered how people present a placeholder for PyPI tokens, and for regression prevention, I've added them to the test base. This is handled well just with the length in the Yara rule, but I've left the guidelines to exclude other sample tokens specifically, in case one needs it in the future.

# rules (e.g. pyarmor), and .pye for SourceDefender-encrypted files.
_SCAN_EXTENSIONS = {".py", ".pye"}
# Extensions to scan inside archives.
_SCAN_TARGETS = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there might be a better way to express these targets? Maybe as globs, to make it clearer which ones are filenames versus suffixes?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about globs (also to cover your second comment), but I decided it would be better to leave it for a separate change to better research what would be the best way for that. My personal preference would be scan every file under a size limit, but I think it requires more considerations

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks. I'll defer to whatever others think makes sense for an initial merge here.

# Different textual files for common places where PyPI tokens are accidentally left.
".md",
".rst",
".env",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it might be good to match anything with *env* in the filename (including suffixes), since .env.prod, .prod.env, env.py, etc. are all conceivable.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - see the previous comment. As a bit of context, the current list is based on where I saw tokens recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

security Security-related issues and pull requests tokens Issues relating to API tokens

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants