Skip to content

Inability to use CMSSignedDataParser for archiveTimestampV2 validation #1983

Open
@bsanchezb

Description

Hello,

We want to implement a validation of legacy signatures augmented with a id-aa-ets-archiveTimestampV2 unsigned property, conforming to ETSI TS 101 733. According to the definition, the message-imprint is created based on the data "as it is", i.e. using its original encoding (e.g. BER or DER):

The value of the messageImprint field within TimeStampToken shall be a hash of the concatenation of:
• the encapContentInfo element of the SignedData sequence;
• any external content being protected by the signature, if the eContent element of the encapContentInfo is omitted;
• the Certificates and crls elements of the SignedData sequence, when present; and
• all data elements in the SignerInfo sequence including all signed and unsigned attributes.

NOTE 3: Unless DER is used throughout, it is recommended that the binary encoding of the ASN.1 structures
being time-stamped be preserved when being archived to ensure that the recalculation of the data hash is
consistent.

Thus, we need to know the original encoding, as well as order of elements, in order to ensure the proper validation of the timestamp's message-imprint.

That is easy to achieve using a CMSSignedData object, but seems impossible by using a CMSSignedDataParser. The problem with CMSSignedData class is the in-memory processing, which will fail for CMS signatures encapsulating large documents (bigger than 2GB).

Please see below the problems we face with the class:

  1. Unable to identify BER or DER encoding of the SignedData.encapContentInfo.

CMSSignedDataParser provides the following methods for encapContentInfo's information extraction, namely #getSignedContentTypeOID and #getSignedContent both of them containing no information about the original encoding of the field (i.e. BER of DER). Unfortunately, extending the CMSSignedDataParser class is not feasible, due to all data related to encapContentInfo is processed on the instantiation within constructor and it is not possible to get back to the ContentInfoParser object again. Therefore, we currently implement a second reading of a CMS document using a custom code to extract the required ContentInfo encoding, see below:

try (InputStream is = *cmsInputStream*) {
      ASN1StreamParser in = new ASN1StreamParser(is);
      ASN1SequenceParser seqParser = (ASN1SequenceParser) in.readObject();
      boolean isBer = seqParser instanceof BERSequenceParser; // otherwise DER-encoded
}

Which is not good either as class BERSequenceParser is deprecated and can be removed in the future.

Ideally I would like to extract all the required for validation information from a CMS using a single reading of a document using an InputStream. I was looking into copying of code from CMSSignedDataParser and creating my own parser, but failed due to the next issue:

  1. SignerInformation has a package-private constructor.

If I create my own parser similar to CMSSignedDataParser (extension is not possible due to the issue above), I need to create a SignerInformationStore based on the extracted information (similarly to the method). However, this is not possible due to SignerInformation constructor being a package private. All other constructors use the SignerInformation object too, thus making it impossible to create a SignerInformation object outside of BC packages.
Therefore, I'm still forced to use a CMSSignedDataParser, but need to parse the document at least twice to extract all information I need.

  1. Not possible to access original SignedData.certificates and SignedData.crls.

From the definition of the archiveTimestampV2, the SignedData.certificates and SignedData.crls fields are also shall be preserved in its original representation. However, CMSSignedDataParser does not provide methods for the data extraction in their original representation, even though the data is stored in the object within the private variables. The CMSSignedDataParser class provides helper method for data extraction, such as #getCertificates, #getAttributeCertificates and so on. The methods do not preserve original order of the elements (e.g. crl and ocsp may be intermixed), nor they do not provide information on the original encoding of the sets (i.e. BER of DER encoded).
This could be solved by creating public methods returning copies of _certSet and _crlSet variables, similarly to as it is implemented in CMSSignedData.

Could you please evaluate on your side what modifications within the CMSSignedDataParser class (or other classes) could be sufficient to be able to extract all the required validation information from a CMS, within a single parsing operation? Or feel free to share any other ideas you may have.

Best regards,
Aleksandr

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions