-
Notifications
You must be signed in to change notification settings - Fork 11
introduce hash-based redaction #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
b201372 to
777b8ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces hash-based redaction as an opt-in feature that allows correlating log entries while preserving privacy. When enabled, sensitive values marked with the HashValue interface are hashed (using SHA-1/HMAC-SHA1) instead of being fully redacted, with the hash output denoted by a dagger (†) prefix in redaction markers.
Key Changes:
- Adds
EnableHashing(salt)API to enable hash-based redaction with optional HMAC salt - Introduces
HashValueinterface and wrapper types (HashString,HashInt, etc.) for marking values as hashable - Implements hash computation logic using SHA-1/HMAC-SHA1 with truncation to 8 hex characters
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| api.go | Exports new HashValue types and EnableHashing function to public API |
| interfaces/interfaces.go | Defines HashValue marker interface and concrete wrapper types for hashable values |
| internal/markers/constants.go | Adds HashPrefix constant (†) and updates regex patterns to handle hash markers |
| internal/markers/hash.go | Implements hash computation logic with SHA-1/HMAC-SHA1 and global state management |
| internal/markers/hash_test.go | Adds unit tests for low-level hash functions with various inputs and salt configurations |
| internal/markers/markers.go | Updates Redact() methods to detect and process hash markers (‹†value›) |
| internal/rfmt/print.go | Adds HashValue interface detection in three locations to wrap values with hash markers during formatting |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
53b3995 to
365e09b
Compare
dhartunian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I will review more thoroughly next week. Thanks for working on this.
| // It MUST be called during program initialization before any concurrent redaction | ||
| // operations begin. The caller must not modify the salt slice after | ||
| // passing it to this function. | ||
| func EnableHashing(salt []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance we can make this configurable concurrently? would be nice to put behind a cluster setting. but also, log redaction is something that's configured via logging config which is not hot reloadable so maybe it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost of this lock/unlock isn't reflected in the benchmark because it will only manifest when there are many concurrent redactions happening. On its own it's not very costly.
In this scenario when there's just a single boolean that needs to be accessed concurrently, a lock is too costly and complex. Can you just use a single atomic boolean here instead? Then there's no locking, no defer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The salt variable can be a string stored in an atomic.Value. I don't see a need to keep the values of enabled and salt locked together.
| // hashString computes a truncated hash of the input string. | ||
| // Uses HMAC-SHA1 if salt is set, otherwise plain SHA1. | ||
| // Must only be called when hashing is enabled (IsHashingEnabled() == true). | ||
| func hashString(value string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to know via benchmarks what the relative cost of hashing is vs just redacting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a screenshot in the PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. do you know what the 4 additional allocations are here?
This commit introduces the ability to hash sensitive information instead
of fully redacting it. This let's the user correlate entries while still
preserving privacy. This is an opt-in feature and requires the user to explicitly mark
values as "hashable".
Changes:
* EnableHashing(salt) is used to turn on hash-based redaction. If this is
not called, redaction behaves as before (full redaction).
* If both SafeValue and HashValue interfaces are implemented on a type,
SafeValue takes precedence.
* Uses SHA-1 / HMAC-SHA1 (if salt is provided) to hash the sensitive
parts of the string.
* A dagger (†) prefix is added within the redaction markers to denote
that it should be hashed instead of fully redacted.
* There are 2 ways to mark a value "hashable" (similar to redaction):
1) Implementing the HashValue interface directly on a type.
2) Wrapping a value with the HashString/HashBytes/Hash* functions
* Hashes are calculated at formatting time and not during string
creation. Just like redaction.
**Example usage**
* Implementing the HashValue interface:
```go
type UserID string
func (u UserID) HashValue() string {} // makes the type "hashable"
fmt.Println(redact.Sprintf("%s", UserID("alice")).Redact()) // Output: ‹522b276a›
```
* Inline with helper functions:
```go
fmt.Println(redact.Sprintf("%s", redact.HashString("alice")).Redact()) // Output: ‹522b276a›
```
365e09b to
74df909
Compare
| } | ||
|
|
||
| // hashString computes a truncated hash of the input string. | ||
| // Uses HMAC-SHA1 if salt is set, otherwise plain SHA1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think HMAC-SHA1 should be good enough to start of with. Has a good balance of speed and "secure enough"
| // It MUST be called during program initialization before any concurrent redaction | ||
| // operations begin. The caller must not modify the salt slice after | ||
| // passing it to this function. | ||
| func EnableHashing(salt []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost of this lock/unlock isn't reflected in the benchmark because it will only manifest when there are many concurrent redactions happening. On its own it's not very costly.
In this scenario when there's just a single boolean that needs to be accessed concurrently, a lock is too costly and complex. Can you just use a single atomic boolean here instead? Then there's no locking, no defer.
| // hashString computes a truncated hash of the input string. | ||
| // Uses HMAC-SHA1 if salt is set, otherwise plain SHA1. | ||
| // Must only be called when hashing is enabled (IsHashingEnabled() == true). | ||
| func hashString(value string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. do you know what the 4 additional allocations are here?
| // It MUST be called during program initialization before any concurrent redaction | ||
| // operations begin. The caller must not modify the salt slice after | ||
| // passing it to this function. | ||
| func EnableHashing(salt []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The salt variable can be a string stored in an atomic.Value. I don't see a need to keep the values of enabled and salt locked together.
|
|
||
| var h []byte | ||
| if len(salt) > 0 { | ||
| mac := hmac.New(sha1.New, salt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an opportunity for pooling or pre-allocating. If you can reuse hash instances from the pool and call Reset on them between calls if the salt hasn't changed.
| mac.Write([]byte(value)) | ||
| h = mac.Sum(nil) | ||
| } else { | ||
| hasher := sha1.New() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here as above. sha1 can be pooled and you can reduce allocations.
| if len(salt) > 0 { | ||
| mac := hmac.New(sha1.New, salt) | ||
| mac.Write([]byte(value)) | ||
| h = mac.Sum(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can allocate the byte array of the correct size here and below and pass in to the Sum, you can avoid an allocation here I think. If you allocate in this function and pass it in, the slice will stay on the stack and be collected immediately, whereas if the Sum function needs to allocate and return to you, that variable ends up on the heap.
| h = hasher.Sum(nil) | ||
| } | ||
|
|
||
| fullHash := hex.EncodeToString(h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here you can pass a pre-allocated array as well as your destination instead of making hex allocate for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use escape analysis to see where the heap allocs are happening.
| } | ||
|
|
||
| fullHash := hex.EncodeToString(h) | ||
| if len(fullHash) > defaultHashLength { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this conditional necessary? if we can guarantee that defaultHashLength isn't greater than the hash length, we can just return fullHash[:defaultHashLength] no?
|
Ack. Will address the comments end of the week. |

This commit introduces the ability to hash sensitive information instead of fully redacting it. This let's the user correlate entries while still preserving privacy. This is an opt-in feature and requires the user to explicitly mark values as "hashable".
Changes:
EnableHashing(salt)is used to turn on hash-based redaction. If this isnot called, redaction behaves as before (full redaction).
SafeValueandHashValueinterfaces are implemented on a type,SafeValuetakes precedence.SHA-1/HMAC-SHA1(if salt is provided) to hash the sensitiveparts of the string.
†) prefix is added within the redaction markers to denotethat it should be hashed instead of fully redacted.
HashValueinterface directly on a type.HashString/HashBytes/Hash*functionscreation. Just like redaction.
Example usage
Implementing the
HashValueinterface:Inline with helper functions:
Benchmark of hashing enabled vs disabled
This change is