Skip to content

Non-deterministic hashes are used by default (via hashbrown via foldhash) #89

@caspark

Description

@caspark

string-interner uses hashbrown with its default-hasher feature, which causes it to use foldhash's foldhash::fast::RandomState to hash strings.

As the name implies, RandomState relies on a one time source of randomness (stack address, allocation, current time, etc) to initialize a global variable which is used to create all subsequent hashes; this means that hashes will vary from run to run, which gives a small measure of DOS resistance.

Normally this is not observable via using string-interner. But, when using string-interner in combination with a runtime code loading approach such as hot-lib-reloader, string-interner will break:

  1. Create a string-interner and intern some strings via get_or_intern()
    • foldhash will set its global variable and calculate hashes which string-interner's hashbrown hashmap will use.
  2. Load (or reload) some code that uses string-interner
  3. In that loaded code, try to get() a string that has already been interned
    • foldhash will again set its global variable (because the (re)loaded code has its own copy of that global variable) and will return different hashes, which means that string-interner will fail to find the string that has already been loaded.

The workaround/fix is straightforward:

pub type Symbol = string_interner::symbol::SymbolU32;

// non-deterministic hashing, broken with hot code reloading
pub type AllSymbols = string_interner::StringInterner<
    string_interner::backend::BucketBackend<Symbol>
>;
// (implicitly using foldhash::fast::RandomState as the hasher) 

// deterministic hashing, works with hot code reloading
pub type AllSymbols = string_interner::StringInterner<
    string_interner::backend::BucketBackend<Symbol>,
    foldhash::fast::FixedState,
>;

but actually figuring out what is going wrong is a bit painful.

Now, one could argue that this is a docs issue upstream: if hashbrown had documented that the hashes it uses are non-deterministic when its default-hasher feature is enabled, I might have figured the issue out a little faster.

But I think there's also a case to be made that actually foldhash::fast::FixedState (or some other deterministic hash) is the right default hasher to be using in string-interner, rather than foldhash::fast::RandomState: the minimal DOS resistance you get from RandomState does not seem to be relevant for the case of string interning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions