Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List input files in history metadata attribute #34

Closed
pont-us opened this issue Apr 20, 2021 · 6 comments · Fixed by #42
Closed

List input files in history metadata attribute #34

pont-us opened this issue Apr 20, 2021 · 6 comments · Fixed by #42
Assignees
Labels
enhancement New feature or request in progress Started working on this

Comments

@pont-us
Copy link
Member

pont-us commented Apr 20, 2021

At present, Zarrs produced by nc2zarr don't contain any indication of the source files from which they were generated. nc2zarr should optionally include a list of source files in the value of the Zarr's history attribute on first generation, and update this value with the additional input files when appending to an existing Zarr.

@pont-us pont-us added the enhancement New feature or request label Apr 20, 2021
@pont-us pont-us self-assigned this Apr 20, 2021
@forman
Copy link
Member

forman commented Apr 20, 2021

Several metadata attributes may be updated in a CF-compliant way, see section Description of File Contents in the CF conventions.

@forman
Copy link
Member

forman commented Apr 20, 2021

Should be resolved together with #20.

@pont-us
Copy link
Member Author

pont-us commented Apr 26, 2021

Implementing this will also help with implementation of a related feature request from CloudFerro: the ability to ignore an input file when appending if it has already been ingested into the target Zarr. nc2zarr could check the input pathname or filename against the list in the history attribute before appending.

@forman
Copy link
Member

forman commented May 25, 2021

@pont-us Please note:

  • Append to the history attribute when what has been done: "\n${date}: converted to Zarr using nc2zarr ${version}".
  • Use the sources attribute to list the sources.
  • Resolve Adjust dataset metadata #20

@forman
Copy link
Member

forman commented May 25, 2021

Implementing this will also help with implementation of a related feature request from CloudFerro: the ability to ignore an input file when appending if it has already been ingested into the target Zarr. nc2zarr could check the input pathname or filename against the list in the history attribute before appending.

We should not use metadata to make decisions about the data in the dataset. Whether a timeslice has already been processed or not should be detected by looking into the data: time coordinates. Once it is detected there are two options: ignore new data or replace existing. To replace an existing timeslice by a more up-to-date one is a valid use case we have in other scenarios. (Example: Same Sentinel 3 Level-2 data is beeing processed in a fast lane and another one that takes much more time but has higher data quality. When the second data arrives, the first is replaced.)

@forman forman assigned forman and unassigned pont-us Jun 4, 2021
@forman forman added the in progress Started working on this label Jun 4, 2021
forman added a commit that referenced this issue Jun 4, 2021
@pont-us
Copy link
Member Author

pont-us commented Jun 4, 2021

We should not use metadata to make decisions about the data in the dataset. Whether a timeslice has already been processed or not should be detected by looking into the data: time coordinates.

Agreed -- I've opened Issue #41 to discuss implementation of this functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in progress Started working on this
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants