Skip to content

Design Work for Unity data archive services #172

Closed
@mike-gangl

Description

@mike-gangl

#46

Design this and get feedback from U-DS team ahead of 24.3 PI Planning.

Image

Above is the design (so far) for the archive service components.

Note 1: the components are logical. if we can re-use things that currently exist (e.g. databases) that is fine, but i think they'll probably need a separation of concerns.

Note 2: This also assumes the autocatalog componet is in place, though minimal changes would be needed if it's not in palce.

After a successful catalog operation, the Cumulus instance will send an SNS response message to any SNS topic listed in the Cumulus collection configuration. For archivable products, the 'archive service' SNS topic will be added to the cumulus collection configuration (how?).

The archive service will have an archiver function that processes CNM messages from the topic/queue. its job is to map the fields in the CNM it receives to a DAAC CNM message. It must also map from the MDPS product type to the DAAC product type. This information needs to be provided by the project (How?). The archiver might need access to the product or stac catalogs to properly create the CNM.

The archiver needs to generate an identifier and use that in the CNM sent to the DAAC. it will store this information and the granule/product information (e.g. uri, project, venue, anything else of value). It will then send the CNM to the DAAC.

The archive service will have an SNS/Queue setup to retrieve messages from one or more DAACs. Upon reciept, the Archive Status function uses the identifier embedded in the CNM to determine the status of a archive job (success, error). It will also handle de-duplicaiton of returned CNM messages if necessary. This function updates the database as well as the data catalog with the archive information.

Open questions:

  1. do we update the cumulus database schema to host archive information (e.g. archive status (in progress, success, error, not-for-archive)? Or should it be a link to a record in the archive service?
  2. Eventually we might want to "clean up" files in the project venue once archived and pull them back when needed. How would we enable that? Would we update the U-DS data catalog with the granule locations?
  3. Whats the error scenario? If a file is not configured correctly to send to a DAAC (e.g. missing required metadata) or If a file fails ingest at the DAAC, it's easy enough to have a flag in the archive DB of that, but how do we notify the operator? This same question probably arrises in the automatic cataloging scenario when an error occurs.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions