Inability to use CMSSignedDataParser for archiveTimestampV2 validation

Hello,

We want to implement a validation of legacy signatures augmented with a id-aa-ets-archiveTimestampV2 unsigned property, conforming to [ETSI TS 101 733](https://www.etsi.org/deliver/etsi_ts/101700_101799/101733/02.02.01_60/ts_101733v020201p.pdf). According to the definition, the message-imprint is created based on the data "as it is", i.e. using its original encoding (e.g. BER or DER):

> The value of the messageImprint field within TimeStampToken shall be a hash of the concatenation of: 
• the encapContentInfo element of the SignedData sequence; 
• any external content being protected by the signature, if the eContent element of the encapContentInfo is omitted; 
• the Certificates and crls elements of the SignedData sequence, when present; and 
• all data elements in the SignerInfo sequence including all signed and unsigned attributes.

> NOTE 3: Unless DER is used throughout, it is recommended that the binary encoding of the ASN.1 structures 
being time-stamped be preserved when being archived to ensure that the recalculation of the data hash is 
consistent.

Thus, we need to know the original encoding, as well as order of elements, in order to ensure the proper validation of the timestamp's message-imprint.

That is easy to achieve using a _CMSSignedData_ object, but seems impossible by using a [CMSSignedDataParser](https://github.com/bcgit/bc-java/blob/0ea89a4388de4f18a2cd3a1801d5bdb2a954644d/pkix/src/main/java/org/bouncycastle/cms/CMSSignedDataParser.java#L87). The problem with _CMSSignedData_ class is the in-memory processing, which will fail for CMS signatures encapsulating large documents (bigger than 2GB).

Please see below the problems we face with the class:

1. Unable to identify BER or DER encoding of the SignedData.encapContentInfo.

_CMSSignedDataParser_ provides the following methods for encapContentInfo's information extraction, namely [#getSignedContentTypeOID](https://github.com/bcgit/bc-java/blob/0ea89a4388de4f18a2cd3a1801d5bdb2a954644d/pkix/src/main/java/org/bouncycastle/cms/CMSSignedDataParser.java#L388) and [#getSignedContent](https://github.com/bcgit/bc-java/blob/0ea89a4388de4f18a2cd3a1801d5bdb2a954644d/pkix/src/main/java/org/bouncycastle/cms/CMSSignedDataParser.java#L393) both of them containing no information about the original encoding of the field (i.e. BER of DER). Unfortunately, extending the _CMSSignedDataParser_ class is not feasible, due to all data related to _encapContentInfo_ is processed on the instantiation [within constructor](https://github.com/bcgit/bc-java/blob/0ea89a4388de4f18a2cd3a1801d5bdb2a954644d/pkix/src/main/java/org/bouncycastle/cms/CMSSignedDataParser.java#L138) and it is not possible to get back to the _ContentInfoParser_ object again. Therefore, we currently implement a second reading of a CMS document using a custom code to extract the required ContentInfo encoding, see below:
```
try (InputStream is = *cmsInputStream*) {
      ASN1StreamParser in = new ASN1StreamParser(is);
      ASN1SequenceParser seqParser = (ASN1SequenceParser) in.readObject();
      boolean isBer = seqParser instanceof BERSequenceParser; // otherwise DER-encoded
}
```
Which is not good either as class [BERSequenceParser](https://github.com/bcgit/bc-java/blob/0ea89a4388de4f18a2cd3a1801d5bdb2a954644d/core/src/main/java/org/bouncycastle/asn1/BERSequenceParser.java#L10) is deprecated and can be removed in the future.

Ideally I would like to extract all the required for validation information from a CMS using a single reading of a document using an InputStream. I was looking into copying of code from _CMSSignedDataParser_ and creating my own parser, but failed due to the next issue:

2. SignerInformation has a package-private constructor.

If I create my own parser similar to _CMSSignedDataParser_ (extension is not possible due to the issue above), I need to create a _SignerInformationStore_ based on the extracted information (similarly to the [method](https://github.com/bcgit/bc-java/blob/0ea89a4388de4f18a2cd3a1801d5bdb2a954644d/pkix/src/main/java/org/bouncycastle/cms/CMSSignedDataParser.java#L262C12-L262C34)). However, this is not possible due to [SignerInformation constructor](https://github.com/bcgit/bc-java/blob/0ea89a4388de4f18a2cd3a1801d5bdb2a954644d/pkix/src/main/java/org/bouncycastle/cms/SignerInformation.java#L60) being a package private. All other constructors use the _SignerInformation_ object too, thus making it impossible to create a _SignerInformation_ object outside of BC packages.
Therefore, I'm still forced to use a _CMSSignedDataParser_, but need to parse the document at least twice to extract all information I need.

3. Not possible to access original SignedData.certificates and SignedData.crls.

From the definition of the archiveTimestampV2, the _SignedData.certificates_ and _SignedData.crls_ fields are also shall be preserved in its original representation. However, _CMSSignedDataParser_ does not provide methods for the data extraction in their original representation, even though the data is stored in the object within the [private variables](https://github.com/bcgit/bc-java/blob/0ea89a4388de4f18a2cd3a1801d5bdb2a954644d/pkix/src/main/java/org/bouncycastle/cms/CMSSignedDataParser.java#L100). The _CMSSignedDataParser_ class provides helper method for data extraction, such as [#getCertificates](https://github.com/bcgit/bc-java/blob/0ea89a4388de4f18a2cd3a1801d5bdb2a954644d/pkix/src/main/java/org/bouncycastle/cms/CMSSignedDataParser.java#L310), [#getAttributeCertificates](https://github.com/bcgit/bc-java/blob/0ea89a4388de4f18a2cd3a1801d5bdb2a954644d/pkix/src/main/java/org/bouncycastle/cms/CMSSignedDataParser.java#L336) and so on. The methods do not preserve original order of the elements (e.g. crl and ocsp may be intermixed), nor they do not provide information on the original encoding of the sets (i.e. BER of DER encoded).
This could be solved by creating public methods returning copies of __certSet_ and __crlSet_ variables, similarly to as it is implemented in _CMSSignedData_.


Could you please evaluate on your side what modifications within the _CMSSignedDataParser_ class (or other classes) could be sufficient to be able to extract all the required validation information from a CMS, within a single parsing operation? Or feel free to share any other ideas you may have.

Best regards,
Aleksandr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inability to use CMSSignedDataParser for archiveTimestampV2 validation #1983

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inability to use CMSSignedDataParser for archiveTimestampV2 validation #1983

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions