Description
Exposing HashMap
's iteration order can cause O(n²)
blowup even in innocent-looking code without the presence of an attacker. In the presence of an attacker, access to the order of a dictionary allows HashDoS-like attacks with only two requests in common scenarios.
Without an attacker
Consider a user with two possibly-disjoint hash maps
let first_map: HashMap<u64, _> = (0..900000).map(|i| (i, ())).collect();
let second_map: HashMap<u64, _> = (900000..1800000).map(|i| (i, ())).collect();
The user wants to merge the hash maps, and does so naïvely,
let mut merged = first_map;
merged.extend(second_map);
Time for merge when second_map
is shuffled: 0.4s
Time for merge when second_map
is not shuffled: 40.s (x100 amplification)
This effect is noticeably more pronounced when merging with a round robin strategy.
With an attacker
The threat model here is simple. The attacker is able to send JSON to the server. The server parses the JSON into a HashMap
and through whatever means - an error message including the formatted map or explicit listing of the contents of the map - may reveal the order of the map.
The attack on this model requires two requests. The first sends some large JSON to the server
let first_request: HashMap<u64, _> = (0..900000).map(|i| (i, ())).collect();
and recieves an ordering
let returned_data: Vec<_> = first_request.into_iter().collect();
The attacker then sends the first and third quarter of this list in a new JSON object.
let second_request: HashMap<u64, _> =
returned_data[..225000].iter()
.chain(&returned_data[450000..675000])
.cloned().collect();
Total time without second request: 0.1s
Total time with second request: 200s (x2000 amplification)
Solutions, near-solutions and non-solutions
These solutions should not be considered to be ordered, necessarily disjoint, nor an exhaustive list of options.
Fall back to BTreeMap
It should be clear that we cannot treat hash maps as a solved problem just because we use SipHash
. In fact, SipHash
is entirely insufficient to solve the problem. My first suggestion is thus to stop trying to make the hasher secure, and instead fall back to BTreeMap
when nonlinear behaviour is detected.
This guarantees a minimum level of performance regardless of the capabilities of the attacker, and allows usage of a faster hashing algorithm by default. Hash maps should get faster by default as a result. This does not prevent having to consider the issue, since the fallback is costly and must be rare, but this is an easier problem than entirely securing the hash map.
Use a hash map without problematic blowup, or less affected by it
Java solves this problem by using a hash map with chaining and converting large buckets to tree maps. This mitigates the impact of degradation, but does not seem to allow using contiguous hash maps by default.
As far as I am aware, the blowup cannot be resolved by moving to another common form of open addressing, although quadratic probing would be significantly less affected by some of these attacks. Chaining alone also defeats the attacks given here, but still requires a secure hash and fails with attackers with more capabilities.
Use different seeds for each hash map
Pull requests #31356 (closed) and #33318 (merged) first proposed incrementing the thread local seed for each hash map. This was later removed when no threat model was seen, but would prevent both attacks listed here.
This still allows attacks when hash maps are reused.
Randomize iteration order
I am not sure how one would randomize iteration order efficiently. However, it should solve the problem unless hashes are exposed through other means.
Ignore the problem
Given that Rust went so far as to use SipHash
, quadratic blowup on code as simple as fst.extend(snd)
seems too extreme to ignore.