Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

Recommendation for short hash characters #32

Open
alok87 opened this issue Mar 5, 2021 · 1 comment
Open

Recommendation for short hash characters #32

alok87 opened this issue Mar 5, 2021 · 1 comment

Comments

@alok87
Copy link

alok87 commented Mar 5, 2021

Like Git has 7, what is the recommendation here?

@alok87 alok87 changed the title Recommendation short hash version Recommendation for short hash characters Mar 5, 2021
@KEINOS
Copy link

KEINOS commented Aug 30, 2021

@alok87

Hope, this comment helps you with something.

I think there IS a way to shorten the hash but there is no "recommendation" for a short hash, like Git commit ID does, with this package.

Due to the below 3 reasons:

  1. Pigeonhole principle (the shorter it gets, the more it collides).
  2. hashstructure uses non-cryptographic hash.
  3. The package aims to detect object change but not to identify an object.

Since Git uses a cryptographic hash algorithm such as SHA-1, it has less probability to collide comparing to non-cryptographic hash algorithms, such as CRC, Checksum, FNV, and etc.

Cryptographic hash algorithms are slow but good for detecting falsification. So, it is useful to identify data.

On the other hand, non-cryptographic hash algorithms are fast and good to detect data change of the same object but very bad to detect falsification. So, it is not convenient to use them as identification values.

I believe this package does not aim to detect falsification of an object rather than simply to detect object change. And the shorter the hash it gets, the less confidence it gets. Thus there never be a "recommendation", I think.

If you want to use the hash value of hashstructure.Hash() function as an ID, like Git CID does, the best effort with your own risk would be:

  1. Encode the value of hashstructure to Base64.
  2. Use the first Nth character.
package main

import (
	"encoding/base64"
	"encoding/binary"
	"fmt"

	"github.com/mitchellh/hashstructure/v2"
)

func main() {
	type ComplexStruct struct {
		Foo  string
		Bar  uint
		Buzz map[string]interface{}
	}

	v := ComplexStruct{
		Foo: "foo",
		Bar: 64,
		Buzz: map[string]interface{}{
			"beep":  true,
			"sound": "bell",
		},
	}

	hashRaw, err := hashstructure.Hash(v, hashstructure.FormatV2, nil)
	if err != nil {
		panic(err)
	}

	// Base16 (hex)
	hashHex := fmt.Sprintf("%x", hashRaw)

	// Base64
	b := make([]byte, 8)
	binary.LittleEndian.PutUint64(b, hashRaw)
	hashBase64 := base64.StdEncoding.EncodeToString(b)

	fmt.Println("Raw   :", hashRaw, "(DEC, Base10)")
	fmt.Println("Base16:", hashHex, "(HEX)")
	fmt.Println("Base64:", hashBase64)
	fmt.Println("Short :", hashHex[0:7], "(Base16)")
	fmt.Println("Short :", hashBase64[0:7], "(Base64)")
}
// Output:
// Raw   : 16126471403938159312 (DEC, Base10)
// Base16: dfccbc4cd83a9ad0 (HEX)
// Base64: 0Jo62Ey8zN8=
// Short : dfccbc4 (Base16)
// Short : 0Jo62Ey (Base64)

But thinking NOT about the backward compatibility of the HashOptions.Hasher method, and thinking about handling JSON data from an API or Etag like usage, I agree to have the cryptographic hash algorithms as an option though. SHA3 and Blake3 would be nice.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants