From d75e5629d89018632dc0e86ee7b1d42af15551ea Mon Sep 17 00:00:00 2001 From: Stephen J Day Date: Tue, 25 Apr 2017 12:22:26 -0700 Subject: [PATCH] descriptor: make definition of digest consistent After some changes to the schema to open up the character set and add separators to the digest algorithm, this change set ensures we have a consistent definition for the components of a digest. The specification has been updated to clarify this decision as well as ensure the specification matches the validation components across the board. The portion of a digest known as `hex` is now known as `encoded` to correspond with the wider character set allowed. Signed-off-by: Stephen J Day --- descriptor.md | 37 ++++++++++++++++++++-------------- image-layout.md | 4 ++-- schema/content-descriptor.json | 2 +- schema/defs-descriptor.json | 2 +- schema/image-index-schema.json | 2 +- 5 files changed, 27 insertions(+), 20 deletions(-) diff --git a/descriptor.md b/descriptor.md index 70bc4fe17..e0ef9d3a9 100644 --- a/descriptor.md +++ b/descriptor.md @@ -61,24 +61,31 @@ Extended _Descriptor_ field additions proposed in other OCI specifications SHOUL The _digest_ property of a Descriptor acts as a content identifier, enabling [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage). It uniquely identifies content by taking a [collision-resistant hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) of the bytes. -If the digest can be communicated in a secure manner, one can retrieve the content from an insecure source, recalculate the digest independently, and be certain that the correct content was obtained. +If the _digest_ can be communicated in a secure manner, one can verify content from an insecure source by recalculating the digest independently, ensuring the content has not been modified. -The value of the digest property is a string consisting of an _algorithm_ portion (the "algorithm identifier") and a _hex_ portion. -The algorithm identifier specifies the cryptographic hash function used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash. +The value of the `digest` property is a string consisting of an _algorithm_ portion and an _encoded_ portion. +The _algorithm_ specifies the cryptographic hash function and encoding used for the digest; the _encoded_ portion contains the encoded result of the hash function. -The digest string MUST match the following grammar: +A digest string MUST match the following grammar: ``` -digest := algorithm ":" hex -algorithm := /[a-z0-9_+.-]+/ -hex := /[a-f0-9]+/ +digest := algorithm ":" encoded +algorithm := /[a-z0-9]+(?:[+._-][a-z0-9]+)*/ +encoded := /[a-zA-Z0-9]+/ ``` +Some example digests include the following: -Some example digest strings include the following: +digest | algorithm | Supported | +------------------------------------------------------------------------|---------------------|-----------| +sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) | Yes | +sha512:401b09eab3c013d4ca54922bb802bec8fd5318192b0a75f201d8b3727429080fb337591abd3e44453b954555b7a0812e1081c39b740293f765eae731f5a65ed1 | [SHA-256](#sha-512) | Yes | +multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8` | Multihash | No | -digest string | algorithm | -------------------------------------------------------------------------|---------------------| -sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) | +Please see [Registered Algorithms](#registered-identifiers) for a list of supported algorithms. + +Implementations SHOULD allow digests that are unsupported to pass validation if they comply with the above grammar. +While `sha256` will only use hex encoded digests, support for separators in _algorithm_ and alpha numeric in _encoded_ is included to allow for future extension of digest support. +As an example, we can paramterize the encoding and algorithm as `multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8`, which would be considered valid but unsupported by this specification. * Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest string. * Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space. @@ -91,24 +98,24 @@ A _digest_ is calculated by the following pseudo-code, where `H` is the selected ``` let ID(C) = Descriptor.digest let C = -let D = ':' + EncodeHex(H(C)) +let D = ':' + Encode(H(C)) let verified = ID(C) == D ``` Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field. Content `C` is a string of bytes. -Function `H` returns the hash of `C` in bytes and is passed to function `EncodeHex` and prefixed with the algorithm to obtain the digest. +Function `H` returns the hash of `C` in bytes and is passed to function `Encode` and prefixed with the algorithm to obtain the digest. The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`. After verification, the following is true: ``` -D == ID(C) == ':' + EncodeHex(H(C)) +D == ID(C) == ':' + Encode(H(C)) ``` The _digest_ is confirmed as the content identifier by independently calculating the _digest_. ### Registered algorithms -While the _algorithm_ portion (the "algorithm identifier") of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256). +While the _algorithm_ component of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256). The following algorithm identifiers are currently defined by this specification: diff --git a/image-layout.md b/image-layout.md index 267d80d80..0aa15199e 100644 --- a/image-layout.md +++ b/image-layout.md @@ -53,8 +53,8 @@ afff3924849e458c5ef237db5f89539274d5e609db5db935ed3959c90f1f2d51 ./blobs/sha256/ ## Blobs * Object names in the `blobs` subdirectories are composed of a directory for each hash algorithm, the children of which will contain the actual content. -* A blob, referenced with digest `:` (per [descriptor](descriptor.md#digests-and-verification)), MUST have its content stored in a file under `blobs//`. -* The character set of the entry name for `` and `` MUST match the respective grammar elements described in [descriptor](descriptor.md#digests-and-verification). +* A blob, referenced with digest `:` (per [descriptor](descriptor.md#digests-and-verification)), MUST have its content stored in a file under `blobs//`. +* The character set of the entry name for `` and `` MUST match the respective grammar elements described in [descriptor](descriptor.md#digests-and-verification). * For example `sha256:5b` will map to the layout `blobs/sha256/5b`. * The blobs directory MAY contain blobs which are not referenced by any of the [refs](#indexjson-file). * The blobs directory MAY be missing referenced blobs, in which case the missing blobs SHOULD be fulfilled by an external blob store. diff --git a/schema/content-descriptor.json b/schema/content-descriptor.json index 1bc47e2a4..69fcea92e 100644 --- a/schema/content-descriptor.json +++ b/schema/content-descriptor.json @@ -13,7 +13,7 @@ "$ref": "defs.json#/definitions/int64" }, "digest": { - "description": "the cryptographic checksum digest of the object, in the pattern ':'", + "description": "the cryptographic checksum digest of the object, in the pattern ':'", "$ref": "defs-descriptor.json#/definitions/digest" }, "urls": { diff --git a/schema/defs-descriptor.json b/schema/defs-descriptor.json index 8955c6fae..85283f222 100644 --- a/schema/defs-descriptor.json +++ b/schema/defs-descriptor.json @@ -7,7 +7,7 @@ "pattern": "^[A-Za-z0-9][A-Za-z0-9!#$&-^_.+]{0,126}/[A-Za-z0-9][A-Za-z0-9!#$&-^_.+]{0,126}$" }, "digest": { - "description": "the cryptographic checksum digest of the object, in the pattern ':'", + "description": "the cryptographic checksum digest of the object, in the pattern ':'", "type": "string", "pattern": "^[a-z0-9]+(?:[+._-][a-z0-9]+)*:[a-zA-Z0-9]+$" }, diff --git a/schema/image-index-schema.json b/schema/image-index-schema.json index d1f6e1086..665f0051e 100644 --- a/schema/image-index-schema.json +++ b/schema/image-index-schema.json @@ -31,7 +31,7 @@ "$ref": "defs.json#/definitions/int64" }, "digest": { - "description": "the cryptographic checksum digest of the object, in the pattern ':'", + "description": "the cryptographic checksum digest of the object, in the pattern ':'", "$ref": "defs-descriptor.json#/definitions/digest" }, "urls": {