Skip to content

Exposure of HashMap iteration order allows for O(n²) blowup. #36481

Open
@Veedrac

Description

@Veedrac

Exposing HashMap's iteration order can cause O(n²) blowup even in innocent-looking code without the presence of an attacker. In the presence of an attacker, access to the order of a dictionary allows HashDoS-like attacks with only two requests in common scenarios.

Without an attacker

Consider a user with two possibly-disjoint hash maps

let first_map: HashMap<u64, _> = (0..900000).map(|i| (i, ())).collect();
let second_map: HashMap<u64, _> = (900000..1800000).map(|i| (i, ())).collect();

The user wants to merge the hash maps, and does so naïvely,

let mut merged = first_map;
merged.extend(second_map);

Time for merge when second_map is shuffled: 0.4s

Time for merge when second_map is not shuffled: 40.s (x100 amplification)

This effect is noticeably more pronounced when merging with a round robin strategy.

With an attacker

The threat model here is simple. The attacker is able to send JSON to the server. The server parses the JSON into a HashMap and through whatever means - an error message including the formatted map or explicit listing of the contents of the map - may reveal the order of the map.

The attack on this model requires two requests. The first sends some large JSON to the server

let first_request: HashMap<u64, _> = (0..900000).map(|i| (i, ())).collect();

and recieves an ordering

let returned_data: Vec<_> = first_request.into_iter().collect();

The attacker then sends the first and third quarter of this list in a new JSON object.

let second_request: HashMap<u64, _> =
 returned_data[..225000].iter()
     .chain(&returned_data[450000..675000])
     .cloned().collect();

Total time without second request: 0.1s

Total time with second request: 200s (x2000 amplification)


Solutions, near-solutions and non-solutions

These solutions should not be considered to be ordered, necessarily disjoint, nor an exhaustive list of options.

Fall back to BTreeMap

It should be clear that we cannot treat hash maps as a solved problem just because we use SipHash. In fact, SipHash is entirely insufficient to solve the problem. My first suggestion is thus to stop trying to make the hasher secure, and instead fall back to BTreeMap when nonlinear behaviour is detected.

This guarantees a minimum level of performance regardless of the capabilities of the attacker, and allows usage of a faster hashing algorithm by default. Hash maps should get faster by default as a result. This does not prevent having to consider the issue, since the fallback is costly and must be rare, but this is an easier problem than entirely securing the hash map.

Use a hash map without problematic blowup, or less affected by it

Java solves this problem by using a hash map with chaining and converting large buckets to tree maps. This mitigates the impact of degradation, but does not seem to allow using contiguous hash maps by default.

As far as I am aware, the blowup cannot be resolved by moving to another common form of open addressing, although quadratic probing would be significantly less affected by some of these attacks. Chaining alone also defeats the attacks given here, but still requires a secure hash and fails with attackers with more capabilities.

Use different seeds for each hash map

Pull requests #31356 (closed) and #33318 (merged) first proposed incrementing the thread local seed for each hash map. This was later removed when no threat model was seen, but would prevent both attacks listed here.

This still allows attacks when hash maps are reused.

Randomize iteration order

I am not sure how one would randomize iteration order efficiently. However, it should solve the problem unless hashes are exposed through other means.

Ignore the problem

Given that Rust went so far as to use SipHash, quadratic blowup on code as simple as fst.extend(snd) seems too extreme to ignore.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-collectionsArea: `std::collections`C-bugCategory: This is a bug.P-mediumMedium priorityT-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions