-
Notifications
You must be signed in to change notification settings - Fork 11
introduce hash-based redaction #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| // Copyright 2025 The Cockroach Authors. | ||
| // | ||
| // Licensed under the Apache License, Version 2.0 (the "License"); | ||
| // you may not use this file except in compliance with the License. | ||
| // You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, software | ||
| // distributed under the License is distributed on an "AS IS" BASIS, | ||
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | ||
| // implied. See the License for the specific language governing | ||
| // permissions and limitations under the License. | ||
|
|
||
| package markers | ||
|
|
||
| import ( | ||
| "crypto/hmac" | ||
| "crypto/sha1" | ||
| "encoding/hex" | ||
| "sync" | ||
| ) | ||
|
|
||
| /* | ||
| Hash function implementation notes: | ||
|
|
||
| We use SHA-1 because it provides a good balance of performance and output size | ||
| for the log correlation use case. SHA-1 is fast and produces compact hashes, | ||
| making it well-suited for high-throughput logging scenarios. | ||
|
|
||
| When a salt is provided via EnableHashing(), we use HMAC-SHA1 which provides | ||
| additional security properties and domain separation. | ||
|
|
||
| The hash output is truncated to 8 hex characters (32 bits) to keep log output | ||
| concise while still providing sufficient collision resistance for typical logging | ||
| workloads. | ||
| */ | ||
|
|
||
| // defaultHashLength is the number of hex characters to use from the SHA-1 hash. | ||
| // 8 hex chars = 32 bits = ~4.3 billion unique values. | ||
| // This provides a good balance between collision resistance and output brevity. | ||
| // For typical logging scenarios with fewer unique sensitive values per analysis window, | ||
| // this collision risk should be acceptable. If not, we can make this configurable in the future. | ||
| const defaultHashLength = 8 | ||
|
|
||
| var hashConfig = struct { | ||
| sync.RWMutex | ||
| enabled bool | ||
| salt []byte | ||
| }{ | ||
| enabled: false, | ||
| salt: nil, | ||
| } | ||
|
|
||
| // EnableHashing enables hash-based redaction with an optional salt. | ||
| // When salt is nil, hash markers use plain SHA1. | ||
| // When salt is provided, hash markers use HMAC-SHA1 for better security. | ||
| func EnableHashing(salt []byte) { | ||
| hashConfig.Lock() | ||
| defer hashConfig.Unlock() | ||
| hashConfig.enabled = true | ||
| hashConfig.salt = salt | ||
| } | ||
|
|
||
| // IsHashingEnabled returns true if hash-based redaction is enabled. | ||
| func IsHashingEnabled() bool { | ||
| hashConfig.RLock() | ||
| defer hashConfig.RUnlock() | ||
| return hashConfig.enabled | ||
| } | ||
|
|
||
| // hashString computes a truncated hash of the input string. | ||
| // Uses HMAC-SHA1 if salt is set, otherwise plain SHA1. | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think HMAC-SHA1 should be good enough to start of with. Has a good balance of speed and "secure enough" |
||
| // Must only be called when hashing is enabled (IsHashingEnabled() == true). | ||
| func hashString(value string) string { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would be nice to know via benchmarks what the relative cost of hashing is vs just redacting.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a screenshot in the PR description
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. do you know what the 4 additional allocations are here? |
||
| hashConfig.RLock() | ||
| salt := hashConfig.salt | ||
| hashConfig.RUnlock() | ||
|
|
||
| var h []byte | ||
| if len(salt) > 0 { | ||
| mac := hmac.New(sha1.New, salt) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like an opportunity for pooling or pre-allocating. If you can reuse hash instances from the pool and call |
||
| mac.Write([]byte(value)) | ||
| h = mac.Sum(nil) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you can allocate the byte array of the correct size here and below and pass in to the |
||
| } else { | ||
| hasher := sha1.New() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here as above. |
||
| hasher.Write([]byte(value)) | ||
| h = hasher.Sum(nil) | ||
| } | ||
|
|
||
| fullHash := hex.EncodeToString(h) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think here you can pass a pre-allocated array as well as your destination instead of making
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You should be able to use escape analysis to see where the heap allocs are happening. |
||
| if len(fullHash) > defaultHashLength { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this conditional necessary? if we can guarantee that |
||
| return fullHash[:defaultHashLength] | ||
| } | ||
| return fullHash | ||
| } | ||
|
|
||
| // hashBytes computes a truncated hash of the input byte slice. | ||
| // Uses HMAC-SHA1 if salt is set, otherwise plain SHA1. | ||
| // Must only be called when hashing is enabled (IsHashingEnabled() == true). | ||
| func hashBytes(value []byte) []byte { | ||
| hashConfig.RLock() | ||
| salt := hashConfig.salt | ||
| hashConfig.RUnlock() | ||
|
|
||
| var h []byte | ||
| if len(salt) > 0 { | ||
| mac := hmac.New(sha1.New, salt) | ||
| mac.Write(value) | ||
| h = mac.Sum(nil) | ||
| } else { | ||
| hasher := sha1.New() | ||
| hasher.Write(value) | ||
| h = hasher.Sum(nil) | ||
| } | ||
|
|
||
| fullHash := make([]byte, sha1.Size*2) | ||
| _ = hex.Encode(fullHash, h) | ||
|
|
||
| if len(fullHash) > defaultHashLength { | ||
| return fullHash[:defaultHashLength] | ||
| } | ||
| return fullHash | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance we can make this configurable concurrently? would be nice to put behind a cluster setting. but also, log redaction is something that's configured via logging config which is not hot reloadable so maybe it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost of making it concurrency safe seemed reasonable. So, went ahead and made the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost of this lock/unlock isn't reflected in the benchmark because it will only manifest when there are many concurrent redactions happening. On its own it's not very costly.
In this scenario when there's just a single boolean that needs to be accessed concurrently, a lock is too costly and complex. Can you just use a single atomic boolean here instead? Then there's no locking, no
defer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
saltvariable can be astringstored in anatomic.Value. I don't see a need to keep the values ofenabledandsaltlocked together.