RFC: Future-proofed cryptographic hash values.

## Problem

As time passes, software that uses a particular hash function will often need to upgrade to a better, faster, stronger, ... one. This introduces large costs: systems may assume a particular hash size, or call `sha1` all over the place.

It's already common to see hashes prefixed with a function id: 

```
sha1-a651ec3c4cc479977777f916fcedb221f38aaba1
sha256-aec71a4d4a8f44bc0c3e1133d5544d724b857cf20fe5aaeb1bc4d6e7c1ee68f1
```

Is this the best way? Maybe it is. But there are some problems:
1. Hashes tend to be transferred/printed encoded in hex, base32, base64, base54, etc.  The _name_ of the hash function may not be compatible with your encoding. (e.g. the hashes above are hex values, but 's' is not a valid hex char) This introduces annoying complexity when merely encoding/decoding hashes for storing / transferring / printing out to uses. (Ugh!) This gets worse when "things expecting a hex hash" that you don' control cannot be used with this scheme.
2. When storing millions of hashes, the extra byte costs of something like `blake2b-` may matter. So we might want to use a much narrower prefix. Particularly given that "widely used and accepted secure cryptographic hash functions" tend to change very little over time (by 2014 there's less than 256 that you might seriously consider).

Is there an RFC for this? I haven't found a "Hash Function Suite" like the "Cypher Suite" in TLS ([RFC 5246/A.5](http://tools.ietf.org/html/rfc5246#appendix-A.5)).
## Potential solutions

Use a short prefix mapping to some "crytographic hash function" suite. This already has to be done: the `sha1-` prefix is more human readable, but probably not a good idea to blindly dispatch a function based on the string `sha1`. Whitelisting specific strings (a blessed table) already happens.

So what would this look like? For example, suppose sha1 is `0x01`

```
# name prefixed
sha1-0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33 # hex
sha1-bpxmpnpkh4h5xsk5bxkh6pc3yj25vcrt # base32
sha1-aef1qioaay9wkladm4f7a6nubty # base58
sha1-c+7hteo/d9vjxq3ufzxbwnxaijm # base64

sha256-2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae # hex
sha256-fqtli23i77di76m3iu6b2mcbgqjuellqmsb37ihzrjpiqytg46xa # base32
sha256-3ymapqcucjxdwprbjfr5mjcpthqfg8pux1txqrem35jj # base58
sha256-lca0a2j/xo/5m0u8htbbnbnclxbkg7+g+ypeigjm564 # base64

# id prefixed (`0x01 for sha1, and `0x02` for sha256)
010beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33 # sha1 in hex
aef65r5v5i7q7w6jlug5i7z4lpbhlwukgm # sha1 in base32
4jvyy9wgauheuckrj7szdd2e1vqs # sha1 in base58
aqvux7xqpw/byv0n1h88w8j12ooz # sha1 in base64

022c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae # sha256 in hex
aiwcnndlnd74nd7ztnctyhjqie2bgqrnobsihp5a7gff5cdcm3t24 # sha256 in base32
eryupqi6npzkezrsc1mgaaorxh7tkyy6v7nc8h5t4zeh # sha256 in base58
aiwmtgto/8ap+ztfpb0wqtqtqi1wzio/opmkxohizueu # sha256 in base64
```

_Pros:_
- hash values are consistent with the encoding :)
- shorter

_Cons:_
- numbers are hard to human-read. This is a less strong point, as some ids would be quickly recognizable. e.g. `0x01` is sha1, `0x02` is sha256, etc.
- prefixing bytes makes encoded values change altogether. :(

---
### on varints

Ideally, for proper future proofing, we want a varint. Though it is to be noted that varints are annoying to parse + slower than fixed-width ints. There are so few "widely used...hash functions" that it may be okay to get away with one byte. Luckily, can wait until we reach 127 functions before we have to decide which one :)

May be able to repurpose utf-8 implementations for this. 

*\* Random UTF-8 question: *\* why are the subsequent bytes wasting two bits each?? '10' prefix below.

```
U+0000    U+007F     1 0xxxxxxx
U+0080    U+07FF     2 110xxxxx  10xxxxxx
U+0800    U+FFFF     3 1110xxxx  10xxxxxx  10xxxxxx
U+10000   U+1FFFFF   4 11110xxx  10xxxxxx  10xxxxxx  10xxxxxx
U+200000  U+3FFFFFF  5 111110xx  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx
U+4000000 U+7FFFFFFF 6 1111110x  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx  10xxxxxx
```

From http://en.wikipedia.org/wiki/UTF-8#Description

Is it to keep the code point ranges nice and rounded-ish?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Future-proofed cryptographic hash values. #1

Problem

Potential solutions

on varints

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RFC: Future-proofed cryptographic hash values. #1

Description

Problem

Potential solutions

on varints

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions