Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash API keys #3842

Merged
merged 3 commits into from
Dec 10, 2024
Merged

Hash API keys #3842

merged 3 commits into from
Dec 10, 2024

Conversation

tillprochaska
Copy link
Contributor

@tillprochaska tillprochaska commented Aug 5, 2024

This is based on #3094. Review and merge #3094 first!


Aleph used to store user API keys as plaintext in the database. This commit changes that to store only a hash of the API key.

API keys are generated using the built-in secrets.token_urlsafe method which returns a random 256 bit token. In contrast to passwords, API keys are not provided by users, have a high entropy, and need to be validated on every request. It seems to be generally accepted that, given 256 bit tokens, salting or using an expensive key derivation functions isn't necessary. (But please challenge this!) For this reason, we’re storing an unsalted SHA-256 hash of the API key which also makes it easy to look up and verify a given API key.

I've added a separate column for the hashed API key rather than reusing the existing column. This allows us to batch-hash all existing plaintext keys without having to differentiate between keys that have already been hashed and those that haven't. Once all existing plaintext API keys have been hashed, the old api_key column can simply be dropped.

This is a breaking change. After deployment, admins need to run the aleph hash-plaintext-api-keys CLI command to hash legacy plaintext API keys. Alternatively, users can regenerate their API key.

@tillprochaska tillprochaska force-pushed the feature/hash-api-keys branch from 01ad6ca to 05df12a Compare August 5, 2024 09:39
@tillprochaska tillprochaska force-pushed the feature/1027-reset-api-key branch from cb47a2f to fd5a17e Compare August 5, 2024 13:27
@tillprochaska tillprochaska force-pushed the feature/hash-api-keys branch from 05df12a to 5f94f9b Compare August 5, 2024 13:28
@stchris stchris modified the milestones: 4.0.1, 4.1.0 Sep 24, 2024
Copy link
Contributor

@stchris stchris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and straightforward, left a minor comment but this looks good to go!

for index, partition in enumerate(results.partitions()):
for role in partition:
role.api_key_digest = hash_api_key(role.api_key)
role.api_key = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minior nitpick, but I think

Suggested change
role.api_key = None
del role.api_key

would make sure that key is no longer in memory (as soon as the garbage collector runs)

@tillprochaska tillprochaska force-pushed the feature/1027-reset-api-key branch 4 times, most recently from cc19c2b to 1cb1e20 Compare October 23, 2024 18:31
@tillprochaska tillprochaska force-pushed the feature/1027-reset-api-key branch from 7bd4b1e to 50b689e Compare November 4, 2024 14:00
@tillprochaska tillprochaska changed the base branch from feature/1027-reset-api-key to develop November 22, 2024 16:03
Aleph used to store user API keys as plaintext in the database. This commit changes that to store only a hash of the API key.

API keys are generated using the built-in `secrets.token_urlsafe` method which returns a random 256 bit token. In contrast to passwords, API keys are not provided by users, have a high entropy, and need to be validated on every request. It seems to be generally accepted that, given 256 bit tokens, salting or using an expensive key derivation functions isn't necessary. For this reason, we’re storing an unsalted SHA-256 hash of the API key which also makes it easy to look up and verify a given API key.

I've added a separate column for the hashed API key rather than reusing the existing column. This allows us to batch-hash all existing plaintext keys without having to differentiate between keys that have already been hashed and those that haven't. Once all existing plaintext API keys have been hashed, the old `api_key` column can simply be dropped.
Required as we do not store plaintext API keys anymore. Also, we want to remove the option to pass API keys via URL parameters in the future.

This makes it impossible to use OpenRefine with non-public collections. This was never documented, and most users weren't aware that they can indeed use OpenRefine with non-public collections anyway.
@tillprochaska tillprochaska merged commit 96c0492 into develop Dec 10, 2024
2 checks passed
tillprochaska added a commit that referenced this pull request Dec 10, 2024
* Store hashed API keys

Aleph used to store user API keys as plaintext in the database. This commit changes that to store only a hash of the API key.

API keys are generated using the built-in `secrets.token_urlsafe` method which returns a random 256 bit token. In contrast to passwords, API keys are not provided by users, have a high entropy, and need to be validated on every request. It seems to be generally accepted that, given 256 bit tokens, salting or using an expensive key derivation functions isn't necessary. For this reason, we’re storing an unsalted SHA-256 hash of the API key which also makes it easy to look up and verify a given API key.

I've added a separate column for the hashed API key rather than reusing the existing column. This allows us to batch-hash all existing plaintext keys without having to differentiate between keys that have already been hashed and those that haven't. Once all existing plaintext API keys have been hashed, the old `api_key` column can simply be dropped.

* Add CLI command to store legacy plaintext API keys

* Remove prefilled API key from OpenRefine endpoints

Required as we do not store plaintext API keys anymore. Also, we want to remove the option to pass API keys via URL parameters in the future.

This makes it impossible to use OpenRefine with non-public collections. This was never documented, and most users weren't aware that they can indeed use OpenRefine with non-public collections anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants