Skip to content

Commit 67631c3

Browse files
woodruffwhugovkfacutuesca
authored
PEP 740: tweak JSON simple API prescriptions (#3768)
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Facundo Tuesca <facundo.tuesca@trailofbits.com>
1 parent 764f563 commit 67631c3

File tree

1 file changed

+130
-91
lines changed

1 file changed

+130
-91
lines changed

peps/pep-0740.rst

Lines changed: 130 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ These changes have two subcomponents:
2424

2525
* Changes to the currently unstandardized PyPI upload API, allowing clients
2626
to upload digital attestations as :ref:`attestation objects <attestation-object>`;
27-
* Changes to the :pep:`503` and :pep:`691` "simple" APIs, allowing clients
28-
to retrieve both digital attestations and
27+
* Changes to the :ref:`HTML and JSON "simple" APIs <packaging:simple-repository-api>`,
28+
allowing clients to retrieve both digital attestations and
2929
`Trusted Publishing <https://docs.pypi.org/trusted-publishers/>`_ metadata
3030
for individual release files as :ref:`provenance objects <provenance-object>`.
3131

@@ -75,7 +75,7 @@ Additionally, this proposal identifies the following motivations:
7575
of the metadata needed by the index to verify an attestation's validity.
7676

7777
This PEP proposes a generic attestation format, containing an
78-
:ref:`attestation payload for signature generation <payload-and-signature-generation>`,
78+
:ref:`attestation statement for signature generation <payload-and-signature-generation>`,
7979
with the expectation that index providers adopt the
8080
format with a suitable source of identity for signature verification, such as
8181
Trusted Publishing.
@@ -116,8 +116,9 @@ areas of Python packaging:
116116
metadata within the cryptographic envelope.
117117

118118
For example, to prevent domain separation between a distribution's name and
119-
its contents, this PEP proposes that digital attestations be performed over
120-
``HASH(name || HASH(contents))`` rather than just ``HASH(contents)``.
119+
its contents, this PEP uses '`Statements <https://github.com/in-toto/attestation/blob/v1.0/spec/v1.0/statement.md>`__'
120+
from the `in-toto project <https://in-toto.io/>`__ to bind the distribution's
121+
contents (via SHA-256 digest) to its filename.
121122

122123

123124
Previous Work
@@ -196,6 +197,9 @@ Index changes
196197
Simple Index
197198
^^^^^^^^^^^^
198199

200+
The following changes are made to the
201+
:ref:`simple repository API <packaging:simple-repository-api-base>`:
202+
199203
* When an uploaded file has one or more attestations, the index **MAY**
200204
provide a ``.provenance`` file adjacent to the hosted distribution.
201205
The format of the ``.provenance`` file **SHALL** be a JSON-encoded
@@ -208,32 +212,34 @@ Simple Index
208212

209213
* When a ``.provenance`` file is present, the index **MAY** include a
210214
``data-provenance`` attribute on its file link. The value of the
211-
``data-provenance`` attribute **SHALL** be the SHA256 digest of the
215+
``data-provenance`` attribute **SHALL** be the SHA-256 digest of the
212216
associated ``.provenance`` file.
213217

214218
* The index **MAY** choose to modify the ``.provenance`` file. For example,
215219
the index **MAY** permit adding additional attestations and verification
216220
materials, such as attestations from third-party auditors or other services.
217221
When the index modifies the ``.provenance`` file, it **MUST** also update the
218-
``data-provenance`` attribute's value to the new SHA256 digest.
222+
``data-provenance`` attribute's value to the new SHA-256 digest.
219223

220224
See :ref:`changes-to-provenance-objects` for an additional discussion of
221225
reasons why a file's provenance may change.
222226

223227
JSON-based Simple API
224228
^^^^^^^^^^^^^^^^^^^^^
225229

230+
The following changes are made to the
231+
:ref:`JSON simple API <packaging:simple-repository-api-json>`:
232+
226233
* When an uploaded file has one or more attestations, the index **MAY**
227-
include a ``provenance`` object in the ``file`` dictionary for that file.
228-
The format of the ``provenance`` object **SHALL** be a JSON-encoded
229-
:ref:`provenance object <provenance-object>`, which **SHALL** contain
230-
the file's attestations.
234+
include a ``provenance`` key in the ``file`` dictionary for that file.
231235

232-
* The index **MAY** choose to modify the ``provenance`` object, under the same
233-
conditions as the ``.provenance`` file specified above.
236+
The value of the ``provenance`` key **SHALL** be a JSON string, which
237+
**SHALL** be the SHA-256 digest of the associated ``.provenance`` file,
238+
as in the Simple Index.
234239

235-
See :ref:`changes-to-provenance-objects` for an additional discussion of
236-
reasons why a file's provenance may change.
240+
See :ref:`appendix-3` for an explanation of the technical decision to
241+
embed the SHA-256 digest in the JSON API, rather than the full
242+
:ref:`provenance object <provenance-object>`.
237243

238244
These changes require a version change to the JSON API:
239245

@@ -260,13 +266,28 @@ object is provided as pseudocode below.
260266
261267
verification_material: VerificationMaterial
262268
"""
263-
Cryptographic materials used to verify `message_signature`.
269+
Cryptographic materials used to verify `envelope`.
270+
"""
271+
272+
envelope: Envelope
273+
"""
274+
The enveloped attestation statement and signature.
275+
"""
276+
277+
278+
@dataclass
279+
class Envelope:
280+
statement: bytes
281+
"""
282+
The attestation statement.
283+
284+
This is represented as opaque bytes on the wire (encoded as base64),
285+
but it MUST be an JSON in-toto v1 Statement.
264286
"""
265287
266-
message_signature: str
288+
signature: bytes
267289
"""
268-
The attestation's signature, as `base64(raw-sig)`, where `raw-sig`
269-
is the raw bytes of the signing operation over the attestation payload.
290+
A signature for the above statement, encoded as base64.
270291
"""
271292
272293
@dataclass
@@ -302,63 +323,36 @@ object) by selecting a new version number.
302323

303324
.. _payload-and-signature-generation:
304325

305-
Attestation payload and signature generation
306-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
307-
308-
The *attestation payload* is the actual claim that is cryptographically signed
309-
over within the attestation object (as the ``message_signature``).
310-
311-
The attestation payload is encoded as an :rfc:`8785` canonicalized JSON object,
312-
with the following pseudocode layout:
313-
314-
.. code-block:: python
315-
316-
@dataclass
317-
class AttestationPayload:
318-
distribution: str
319-
"""
320-
The file name of the Python package distribution.
321-
"""
322-
323-
digest: str
324-
"""
325-
The SHA-256 digest of the distribution's contents, as a hexadecimal string.
326-
"""
327-
328-
The value of ``distribution`` is the same distribution filename that appears
329-
in the :pep:`503` and :pep:`691` APIs. For example, ``distribution`` would be
330-
``sampleproject-1.2.0-py2.py3-none-any.whl`` for the following simple index
331-
entry:
332-
333-
.. code-block:: html
334-
335-
<a href="https://example.com/...">sampleproject-1.2.0-py2.py3-none-any.whl</a><br/>
336-
337-
In practice, this means that ``distribution`` is defined by the PyPA's
338-
living specifications for
339-
:ref:`binary distributions <packaging:binary-distribution-format>` and
340-
:ref:`source distributions <packaging:source-distribution-format>`, although
341-
non-conforming distributions may be hosted by the index.
326+
Attestation statement and signature generation
327+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
342328

343-
The following pseudocode demonstrates the construction of an attestation
344-
payload and its signature:
329+
The *attestation statement* is the actual claim that is cryptographically signed
330+
over within the attestation object (i.e., the ``envelope.statement``).
345331

346-
.. code-block:: python
332+
The attestation statement is encoded as a
333+
`v1 in-toto Statement object <https://github.com/in-toto/attestation/blob/v1.0/spec/v1.0/statement.md>`__,
334+
in JSON form. When serialized the statement is treated as an opaque binary blob,
335+
avoiding the need for canonicalization. An example JSON-encoded statement is
336+
provided in :ref:`appendix-4`.
347337

348-
def build_payload(dist: Path) -> AttestationPayload:
349-
return AttestationPayload(
350-
distribution=dist.name,
351-
digest=sha256(dist.read_bytes()).hexdigest,
352-
)
338+
In addition to being a v1 in-toto Statement, the attestation statement is constrained
339+
in the following ways:
353340

354-
attestation_payload = build_payload("sampleproject-1.2.0-py2.py3-none-any.whl")
341+
* The in-toto ``subject`` **MUST** contain only a single subject.
342+
* ``subject[0].name`` is the distribution's filename, which **MUST** be
343+
a valid :ref:`source distribution <packaging:source-distribution-format>` or
344+
:ref:`wheel distribution <packaging:binary-distribution-format>` filename.
345+
* ``subject[0].digest`` **MUST** contain a SHA-256 digest. Other digests
346+
**MAY** be present. The digests **MUST** be represented as hexadecimal strings.
347+
* The following ``predicateType`` values are supported:
355348

356-
# canonical_json is a fictitious module that performs RFC 8785 canonical
357-
# JSON serialization.
358-
encoded_payload = canonical_json.dumps(asdict(attestation_payload))
349+
* `SLSA Provenance <https://slsa.dev/provenance/v1>`__: ``https://slsa.dev/provenance/v1``
350+
* `PyPI Publish Attestation <https://docs.pypi.org/attestations/publish/v1>`__: ``https://docs.pypi.org/attestations/publish/v1``
359351

360-
raw_signature = signing_key.sign(encoded_payload, ECDSA(SHA2_256()))
361-
message_signature = b64encode(raw_signature)
352+
The signature over this statement is constructed using the
353+
`v1 DSSE signature protocol <https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md>`__,
354+
with a ``PAYLOAD_TYPE`` of ``application/vnd.in-toto+json`` and a ``PAYLOAD_BODY`` of the JSON-encoded
355+
statement above. No other ``PAYLOAD_TYPE`` is permitted.
362356

363357
.. _provenance-object:
364358

@@ -368,9 +362,8 @@ Provenance objects
368362
The index will serve uploaded attestations along with metadata that can assist
369363
in verifying them in the form of JSON serialized objects.
370364

371-
These *provenance objects* will be available via both the :pep:`503` Simple Index
372-
and :pep:`691` JSON-based Simple API as described above, and will have the
373-
following layout:
365+
These *provenance objects* will be available via both the Simple Index
366+
and JSON-based Simple API as described above, and will have the following layout:
374367

375368
.. code-block:: json
376369
@@ -488,7 +481,8 @@ for changes to the provenance object include but are not limited to:
488481
Attestation verification
489482
------------------------
490483

491-
Verifying an attestation object requires verification of each of the following:
484+
Verifying an attestation object against a distribution file requires verification of each of the
485+
following:
492486

493487
* ``version`` is ``1``. The verifier **MUST** reject any other version.
494488
* ``verification_material.certificate`` is a valid signing certificate, as
@@ -497,9 +491,15 @@ Verifying an attestation object requires verification of each of the following:
497491
* ``verification_material.certificate`` identifies an appropriate signing
498492
subject, such as the machine identity of the Trusted Publisher that published
499493
the package.
500-
* ``message_signature`` can be verified by ``verification_material.certificate``,
501-
using the reconstructed attestation payload as the cleartext input. The
502-
verifier **MUST** reconstruct the attestation payload itself.
494+
* ``envelope.statement`` is a valid in-toto v1 Statement, with a subject
495+
and digest that **MUST** match the distribution's filename and contents.
496+
For the distribution's filename, matching **MUST** be performed by parsing
497+
using the appropriate source distribution or wheel filename format, as
498+
the statement's subject may be equivalent but normalized.
499+
* ``envelope.signature`` is a valid signature for ``envelope.statement``
500+
corresponding to ``verification_material.certificate``,
501+
as reconstituted via the
502+
`v1 DSSE signature protocol <https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md>`__.
503503

504504
In addition to the above required steps, a verifier **MAY** additionally verify
505505
``verification_material.transparency_entries`` on a policy basis, e.g. requiring
@@ -543,19 +543,6 @@ unstated presumption with earlier mechanisms, like PGP and wheel signatures.
543543
This PEP does not preclude or exclude future index trust mechanisms, such
544544
as :pep:`458` and/or :pep:`480`.
545545

546-
Flexible attestations
547-
---------------------
548-
549-
This PEP specifies a fixed attestation payload (defined in
550-
:ref:`payload-and-signature-generation`), binding the contents of each uploaded
551-
file to its public name on the index. This payload format is fixed and
552-
inflexible to ease implementation, and to minimize additional mechanical
553-
changes to the index itself (e.g., needing to store and service detached
554-
attestation documents).
555-
556-
This PEP does not preclude or exclude future more flexible attestation payload
557-
formats, such as ones built on `in-toto <https://in-toto.io/>`__.
558-
559546
Recommendations
560547
===============
561548

@@ -628,7 +615,7 @@ of signed inclusion time, and can be verified either online or offline.
628615
629616
inclusion_proof: InclusionProof
630617
"""
631-
The actual inclusion proof the the log entry.
618+
The actual inclusion proof of the log entry.
632619
"""
633620
634621
@@ -668,6 +655,58 @@ of signed inclusion time, and can be verified either online or offline.
668655
Cosigned checkpoints from zero or more log witnesses.
669656
"""
670657
658+
.. _appendix-3:
659+
660+
Appendix 3: Simple JSON API size considerations
661+
===============================================
662+
663+
A previous draft of this PEP required embedding each
664+
:ref:`provenance object <provenance-object>` directly into its appropriate part
665+
of the JSON Simple API.
666+
667+
The current version of this PEP embeds the SHA-256 digest of the provenance
668+
object instead. This is done for size and network bandwidth consideration
669+
reasons:
670+
671+
1. We estimate the typical size of an attestation object to be approximately
672+
5.3 KB of JSON.
673+
2. We conservatively estimate that indices eventually host around 3 attestations
674+
per release file, or approximately 15.9 KB of JSON per combined provenance
675+
object.
676+
3. As of May 2024, the average project on PyPI has approximately 21 release
677+
files. We conservatively expect this average to increase over time.
678+
4. Combined, these numbers imply that a typical project might expect to host
679+
between 60 and 70 attestations, or approximately 339 KB of additional JSON
680+
in its "project detail" endpoint.
681+
682+
These numbers are significantly worse in "pathological" cases, where projects
683+
have hundreds or thousands of releases and/or dozens of files per release.
684+
685+
.. _appendix-4:
686+
687+
Appendix 4: Example attestation statement
688+
=========================================
689+
690+
Given a source distribution ``sampleproject-1.2.3.tar.gz`` with a SHA-256
691+
digest of ``e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855``,
692+
the following is an appropriate in-toto Statement, as a JSON object:
693+
694+
.. code-block:: json
695+
696+
{
697+
"_type": "https://in-toto.io/Statement/v1",
698+
"subject": [
699+
{
700+
"name": "sampleproject-1.2.3.tar.gz",
701+
"digest": {"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}
702+
}
703+
],
704+
"predicateType": "https://some-arbitrary-predicate.example.com/v1",
705+
"predicate": {
706+
"something-else": "foo"
707+
}
708+
}
709+
671710
Copyright
672711
=========
673712

0 commit comments

Comments
 (0)