Skip to content

Commit

Permalink
descriptor: make definition of digest consistent
Browse files Browse the repository at this point in the history
After some changes to the schema to open up the character set and add
separators to the digest algorithm, this change set ensures we have a
consistent definition for the components of a digest. The specification
has been updated to clarify this decision as well as ensure the
specification matches the validation components across the board.

The portion of a digest known as `hex` is now known as `encoded` to
correspond with the wider character set allowed.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
  • Loading branch information
stevvooe committed May 4, 2017
1 parent 2e9f3dd commit d75e562
Show file tree
Hide file tree
Showing 5 changed files with 27 additions and 20 deletions.
37 changes: 22 additions & 15 deletions descriptor.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,24 +61,31 @@ Extended _Descriptor_ field additions proposed in other OCI specifications SHOUL

The _digest_ property of a Descriptor acts as a content identifier, enabling [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage).
It uniquely identifies content by taking a [collision-resistant hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) of the bytes.
If the digest can be communicated in a secure manner, one can retrieve the content from an insecure source, recalculate the digest independently, and be certain that the correct content was obtained.
If the _digest_ can be communicated in a secure manner, one can verify content from an insecure source by recalculating the digest independently, ensuring the content has not been modified.

The value of the digest property is a string consisting of an _algorithm_ portion (the "algorithm identifier") and a _hex_ portion.
The algorithm identifier specifies the cryptographic hash function used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash.
The value of the `digest` property is a string consisting of an _algorithm_ portion and an _encoded_ portion.
The _algorithm_ specifies the cryptographic hash function and encoding used for the digest; the _encoded_ portion contains the encoded result of the hash function.

The digest string MUST match the following grammar:
A digest string MUST match the following grammar:

```
digest := algorithm ":" hex
algorithm := /[a-z0-9_+.-]+/
hex := /[a-f0-9]+/
digest := algorithm ":" encoded
algorithm := /[a-z0-9]+(?:[+._-][a-z0-9]+)*/
encoded := /[a-zA-Z0-9]+/
```
Some example digests include the following:

Some example digest strings include the following:
digest | algorithm | Supported |
------------------------------------------------------------------------|---------------------|-----------|
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) | Yes |
sha512:401b09eab3c013d4ca54922bb802bec8fd5318192b0a75f201d8b3727429080fb337591abd3e44453b954555b7a0812e1081c39b740293f765eae731f5a65ed1 | [SHA-256](#sha-512) | Yes |
multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8` | Multihash | No |

digest string | algorithm |
------------------------------------------------------------------------|---------------------|
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) |
Please see [Registered Algorithms](#registered-identifiers) for a list of supported algorithms.

Implementations SHOULD allow digests that are unsupported to pass validation if they comply with the above grammar.
While `sha256` will only use hex encoded digests, support for separators in _algorithm_ and alpha numeric in _encoded_ is included to allow for future extension of digest support.
As an example, we can paramterize the encoding and algorithm as `multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8`, which would be considered valid but unsupported by this specification.

* Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest string.
* Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space.
Expand All @@ -91,24 +98,24 @@ A _digest_ is calculated by the following pseudo-code, where `H` is the selected
```
let ID(C) = Descriptor.digest
let C = <bytes>
let D = '<alg>:' + EncodeHex(H(C))
let D = '<alg>:' + Encode(H(C))
let verified = ID(C) == D
```
Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field.
Content `C` is a string of bytes.
Function `H` returns the hash of `C` in bytes and is passed to function `EncodeHex` and prefixed with the algorithm to obtain the digest.
Function `H` returns the hash of `C` in bytes and is passed to function `Encode` and prefixed with the algorithm to obtain the digest.
The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`.
After verification, the following is true:

```
D == ID(C) == '<alg>:' + EncodeHex(H(C))
D == ID(C) == '<alg>:' + Encode(H(C))
```

The _digest_ is confirmed as the content identifier by independently calculating the _digest_.

### Registered algorithms

While the _algorithm_ portion (the "algorithm identifier") of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).
While the _algorithm_ component of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).

The following algorithm identifiers are currently defined by this specification:

Expand Down
4 changes: 2 additions & 2 deletions image-layout.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ afff3924849e458c5ef237db5f89539274d5e609db5db935ed3959c90f1f2d51 ./blobs/sha256/
## Blobs

* Object names in the `blobs` subdirectories are composed of a directory for each hash algorithm, the children of which will contain the actual content.
* A blob, referenced with digest `<alg>:<hex>` (per [descriptor](descriptor.md#digests-and-verification)), MUST have its content stored in a file under `blobs/<alg>/<hex>`.
* The character set of the entry name for `<hex>` and `<alg>` MUST match the respective grammar elements described in [descriptor](descriptor.md#digests-and-verification).
* A blob, referenced with digest `<alg>:<encoded>` (per [descriptor](descriptor.md#digests-and-verification)), MUST have its content stored in a file under `blobs/<alg>/<encoded>`.
* The character set of the entry name for `<encoded>` and `<alg>` MUST match the respective grammar elements described in [descriptor](descriptor.md#digests-and-verification).
* For example `sha256:5b` will map to the layout `blobs/sha256/5b`.
* The blobs directory MAY contain blobs which are not referenced by any of the [refs](#indexjson-file).
* The blobs directory MAY be missing referenced blobs, in which case the missing blobs SHOULD be fulfilled by an external blob store.
Expand Down
2 changes: 1 addition & 1 deletion schema/content-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"$ref": "defs.json#/definitions/int64"
},
"digest": {
"description": "the cryptographic checksum digest of the object, in the pattern '<hash>:<hexadecimal digest>'",
"description": "the cryptographic checksum digest of the object, in the pattern '<algorithm>:<encoded>'",
"$ref": "defs-descriptor.json#/definitions/digest"
},
"urls": {
Expand Down
2 changes: 1 addition & 1 deletion schema/defs-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"pattern": "^[A-Za-z0-9][A-Za-z0-9!#$&-^_.+]{0,126}/[A-Za-z0-9][A-Za-z0-9!#$&-^_.+]{0,126}$"
},
"digest": {
"description": "the cryptographic checksum digest of the object, in the pattern '<algorithm>:<digest>'",
"description": "the cryptographic checksum digest of the object, in the pattern '<algorithm>:<encoded>'",
"type": "string",
"pattern": "^[a-z0-9]+(?:[+._-][a-z0-9]+)*:[a-zA-Z0-9]+$"
},
Expand Down
2 changes: 1 addition & 1 deletion schema/image-index-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"$ref": "defs.json#/definitions/int64"
},
"digest": {
"description": "the cryptographic checksum digest of the object, in the pattern '<hash>:<hexadecimal digest>'",
"description": "the cryptographic checksum digest of the object, in the pattern '<algorithm>:<encoded>'",
"$ref": "defs-descriptor.json#/definitions/digest"
},
"urls": {
Expand Down

0 comments on commit d75e562

Please sign in to comment.