Skip to content

idea how to merge nt and nt-long, and implement other multi-limb hashes based on Merkle–Damgård construction #5534

Open
@AlekseyCherepanov

Description

@AlekseyCherepanov

Hashes based on Merkle–Damgård are computed in blocks (called limbs in john's sources): input is split into blocks of fixed size, then for each block compression function is applied to old state and the block producing new state, constant IV is used as old state for the first block. (Also there is length encoded into the last block. But it is not important for this idea.) It is applicable to raw-md4 and nt, raw-md5, raw-sha1, raw-sha256, and so on.

Fast and simple formats support lengths fitting into one block. Also such hashes support reversing of rounds because IV are constant. For two+ limbs, it is not possible to reverse rounds (as far as I know).

A simple format supporting lengths above one block might omit reversing. It would be slower for short hashes.

There might be two formats. Currently there are nt and nt-long for nt. User chooses the correct one. It is not easy to choose automatically because choice depends onto length of candidates that is not known beforehand in some cases.

Other approach is to allow binary to be found in db_main by both reversed hash and regular hash. It would require changes to formats interface. Also it would enlarge the db hurting cases like loading HIBP. (It is possible to have separate bitmaps for variants, so in theory using additional memory, it might be implemented without additional collisions.)

And I got the idea of hybrid approach that can be enclosed into format. Reversing is just a transformation on a full hash. We may apply it to any particular digest's value. Formats do apply it loading hashes. Moreover formats apply reversing in cmp_one on full hash computed by openssl's scalar hashing. So we may use reversing this way: short candidates would be hashed with shortened compression function, longer candidates would be hashed with full compression function and then reversing would be applied for lookup. This way short candidates would be as fast as with special format while longer candidates would be slower slightly than usual.

I think hybrid approach provides a viable trade-off between speed and usability. It would be even better with ability to disable reversing at run-time to speed up longer candidates.

125 ASCII chars for NT require 5 blocks of hashing (123 chars require 4 blocks). Everything above 1 block might be packed together for computations. So instead of storing by number of blocks (like all 2-block candidates, all 3-block candidates and so on), it is possible to have mixed sequence of limbs with marked ends. For instance, having vector size 2 it is possible to pack one 4-block candidate together with two 2-block candidates to hash 4 limbs only extracting 1 end after 2 limbs and another 2 ends after 2 more limbs. Simpler approach would be to compute 2 limbs for 2-block candidates and then 4 for lonely 4-block candidate. But the described mixed storage seems overcomplicated. Also 125 chars for raw-md5 fit into 3 blocks and mixed storage might be less profitable.

With or without reversing, hashing of candidates with mixed number of blocks requires some extra buffering to reduce number of incomplete vectors. (Better length-based queuing is not supported currently.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions