-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Foundation] Modernize hashing in Foundation's Swift-only types #23832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…*all* components The new definition diverges from the existing hashValue: it feeds all path components to the hasher, rather than just a limited subset. Hashing all the bits that are compared in == is necessary to ensure proper operation of the hash table.
This replaces the compiler-generated variant (based on hashValue) with a better implementation.
Note: URL does not implement Hashable.
5cec1e5
to
b711ed9
Compare
return Int(bitPattern: __CFHashDouble(_time)) | ||
} else { // 10.11 and previous behavior fallback; this must allocate a date to reference the hash value and then throw away the reference | ||
return NSDate(timeIntervalSinceReferenceDate: _time).hash | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: There is no need to emulate NSDate's hashing behavior, or the behavior of any previous version of Date.hashValue
. Swift is very explicit that the values returned by hashValue
aren't to be treated as stable, and Swift trains programmers to expect randomized hashes.
Granted, in this case (like with many other hashValues removed by this PR), 5.0 did not vend properly seeded hashes, so in theory someone could be relying on the exact values returned. I believe this is still a low-risk change; and since this is not inlinable code, we have the option to re-add specific hashValue
implementations with an SDK check if it turns some of them have ABI impact.
@@ -155,15 +155,11 @@ public struct DateInterval : ReferenceConvertible, Comparable, Hashable, Codable | |||
return false | |||
} | |||
|
|||
public var hashValue: Int { | |||
var buf: (UInt, UInt) = (UInt(start.timeIntervalSinceReferenceDate), UInt(end.timeIntervalSinceReferenceDate)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the end
date here was problematic, since ==
compares start
and duration
, not start
and end
.
end
does not always include all the information that's present in duration
, so doing this results in weaker hashes.
The new code is careful to match the implementation of ==
.
let range = startIndex ..< Swift.min(startIndex + 80, endIndex) | ||
storage.withUnsafeBytes(in: range) { | ||
// To ensure strong hashing, all bytes must be fed into the hasher. | ||
withUnsafeBytes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the cases where the old implementation deliberately neglected to feed all relevant information into the hasher. This makes it trivial to induce an arbitrary number of collisions, despite all the effort that went into implementing a high-quality universal hash function.
This is particularly a bad idea for Data
, which is the standard type for safely exchanging byte buffers across module boundaries. It is reasonable to expect Data
to implement the same high quality hashing as Array
does; silently doing otherwise is not a good idea.
We previously discussed this issue in #21754.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced about this without Hasher
having some guarantees about not being O(n)
, but I think we should try to set up a meeting to discuss about strategy here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! I've spun off this change into #23876. Data
is already implementing the right APIs, so this is just a matter of changing the implementation.
@@ -36,8 +36,19 @@ public struct Measurement<UnitType : Unit> : ReferenceConvertible, Comparable, E | |||
self.unit = unit | |||
} | |||
|
|||
public var hashValue: Int { | |||
return Int(bitPattern: __CFHashDouble(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original implementation of Measurement.hashValue
is not a valid hash, because it is in conflict with ==
.
(A version of this change was originally submitted as #20879.)
.ascii, | ||
.utf8, | ||
] | ||
checkHashable(instances, equalityOracle: { $0 == $1 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkHashable
verifies that instances provided don't violate Equality
and Hashable
requirements, including the soft requirement for strong hashing. (I.e., that non-equal values must differ in what they feed into the hasher.)
The equalityOracle
closure provides the expected equivalence relation over the indices in the collection passed in as the first argument.
expectEqual(AffineTransform.self, type(of: anyHashables[2].base)) | ||
expectNotEqual(anyHashables[0], anyHashables[1]) | ||
expectEqual(anyHashables[1], anyHashables[2]) | ||
checkHashableGroups(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkHashableGroups
is like checkHashable
, but it represents the equivalence relation as nested arrays: each element in values
is supposed to contain values of a distinct equivalence class.
I.e., a
and bridged(a)
must compare equal, but a
must not compare equal to any item outside its own array.
I submitted a PR for swift-corelibs-foundation with matching changes: swiftlang/swift-corelibs-foundation#2118 |
@swift-ci please test |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@swift-ci please test macOS platform |
Build failed |
@swift-ci please test macOS platform |
@swift-ci please test linux platform |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
71833f4
to
6647c50
Compare
@swift-ci please test |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Hm; |
@swift-ci please test |
This comment has been minimized.
This comment has been minimized.
@parkera @millenomi @itaiferber are we good to go ahead with this? (with the |
Yup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes LGTM
This PR upgrades the Foundation overlay to use the new hashing API introduced in SE-0206. This gets rid of the deprecation warnings for
hashValue
implementations and brings the overlay in sync with current Swift best practices.Background
SE-0206 radically changed how hashing works in Swift 4.2+. It is now the stdlib's responsibility to choose a hash function; types implementing
Hashable
only need to worry about selecting what they want to feed to the hasher. To make this possible,Hashable
now requires hashing to be implemented through thehash(into:)
method.The old
hashValue
property is deprecated as aHashable
requirement, and it's only kept to support source-level compatibility with existing code.Set
andDictionary
have switched over to usinghash(into:)
as their entry point for hashing; they do not directly callhashValue
at all.Beyond simplifying hashing, the intent of SE-0206 is to enable Swift to provide certain guarantees about its quality. In particular: as long as
hash(into:)
feeds enough data the hasher to unambiguously decide equality, Swift attempts to guarantee that collision attacks won't be possible. For this to work, it is critically important forHashable
implementations to include everything thatEquatable.==
looks at; and this is especially the case for the basic boundary types that come built-in with Swift, likeData
.The exact hash algorithm used is a private implementation detail of the stdlib; the
Hasher
API was carefully designed to leak as little information about it as possible, enabling future versions of the stdlib to switch to other algorithms without worrying about ABI compatibility issues.While random seeding makes the hash algorithm opaque, it is technically possible for Swift code to rely on the particular hash encoding of a stdlib- (or SDK-)provided type. This is highly unlikely to happen in practice, but the longer we wait to fix hashing issues, the more likely it will be that someone starts relying on hashing particulars. For example, I could imagine this to happen through a well-meaning desire to work around a broken hash implementation. It is therefore important to do this sooner rather than later.
Notes
I'll highlight points specific to a particular changes as self-review comments. Here is a list of general remarks:
It is neither required nor desirable for Swift's
hashValue
to return reproducible values.We don't need to ever match hash values between Swift types and the Cocoa classes that they bridge with. The act of bridging involves a full rehashing of all Set/Dictionary values.
The only hard requirement is that hashing needs to be consistent with the equivalence classes implemented by
==
within the same type.Hashing is meaningless without equality.
Equatable
only defines equality for values of a single type. Therefore,Hashable
makes no requirements or expectations about the hash values produced by values of different types.It is perfectly fine for hash values produced by different types to not match.
(0 as Int8).hashValue
and(0 as Int16).hashValue
will (typically) not be equal in Swift.It is also perfectly fine for values of two distinct types to produce matching hash values.
Measurement<UnitLength>.hash(into:)
andMeasurement<UnitDuration>.hash(into:)
are allowed to use the same hash encoding, since there is no way they can ever produce large-scale collisions within the same hash table.While this is not a hard requirement for user code, for boundary types provided in the stdlib/SDK, we require that hashing isn't just consistent with equality, but that it's equivalent to it. The Swift test suite has checks to actively enforce this -- this is possible through repeatedly salting the hash function.
"Optimizing" hashing by omitting some of the data compared by
==
is generally a mistake in Swift, because it completely breaks all guarantees about the strength of hashing, and opens the door to (accidental or deliberate) collision attacks.It's perfectly acceptable to hash a gigabyte of data if someone inserts some large value (such as a big collection) as a key in a hash table. Multi-megabyte String keys are easy to protect against; hidden hashing weaknesses aren't.
It would be nice if we could somehow improve hashing in Foundation's Objective-C classes, but this is outside the scope of this PR (or indeed, this repository). However, we should at least make sure Swift programmers can rely on the quality of hashes produced by native Swift value types.
Modernize hashing in Foundation's Swift-only types swift-corelibs-foundation#2118 is the corresponding PR for swift-corelibs-foundation. These PRs should land at the same time.
Replacing
hashValue
implementations on types withhash(into:)
has no effect on the symbols exposed as ABI -- the compiler automatically synthesizeshashValue
fromhash(into:)
, and vice versa. (This is not the case in protocol extensions, but we've fixed those in the 4.2/5.0 timeframe.) In the Swift 5.0 ABI, all of these types include definitions for bothhashValue
and `hash(into:) so we don't need to worry about availability issues.The sole exception is(Edit:URL
, which implementshashValue
but isn'tHashable
. Unfortunately we can't add the conformance in 5.1, but we should add a (versioned)hash(into:)
definition to discourage people from trying to implement it on their own.URL
is not a special case; it'sHashable
like the rest. I got mislead by an unrelated error.)rdar://problem/43394032