Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

desi_archive_tilenight make checksums #1762

Open
sbailey opened this issue May 5, 2022 · 5 comments
Open

desi_archive_tilenight make checksums #1762

sbailey opened this issue May 5, 2022 · 5 comments
Assignees
Labels

Comments

@sbailey
Copy link
Contributor

sbailey commented May 5, 2022

When desi_archive_tilenight creates each tiles/archive/TILEID/ARCHIVEDATE directory, it should also create checksums for that directory.

@weaverba137 please specify how checksums are created for productions so that we use a consistent method (checksum algorithm, filename, ...)

Related is #1644 about cross production tile archiving. Nominally this form of archiving would create a link daily/tiles/archive/TILEID/ARCHIVEDATE -> ../../../../guadalupe/tiles/cumulative/TILEID/LASTNIGHT . Ideally the guadalupe production would already have a checksum file in tiles/cumulative/TILEID/LASTNIGHT matching the same form that we would have put into daily/tiles/archive/TILEID/ARCHIVEDATE if it wasn't a link. If productions like guadalupe have a different organization for where it would put the checksum, let's define that and discuss options.

@weaverba137
Copy link
Member

I note for the record that currently there are no symlinks in daily/tiles/archive. What is the level of readiness for addressing #1644 versus this issue? Specifically, guadalupe checksums are being created in the immediate future (~days) and therefore:

  • We do not want to add checksum files to nights that will be symlinked to guadalupe anyway.
  • We do want to follow the same convention for checksum file names as guadalupe.

This also suggests that the layout of the tiles/archive/TILEID/ARCHIVEDATE directory is the same as or very similar to the layout of a SPECPROD/tiles/cumulative/TILEID/LASTNIGHT directory. Is this a reasonably safe assumption?

By "reasonably safe": e.g. there could be differences in the number or types of files in certain cases, but there will not be differences in subdirectories. In this case, there will be a logs/ subdirectory but there shouldn't be any other subdirectories.

@sbailey
Copy link
Contributor Author

sbailey commented May 6, 2022

No one is actively working on #1644 (cross prod archiving), so it can wait for guadalupe checksumming.

We do not want to add checksum files to nights that will be symlinked to guadalupe anyway.

Clarifying: cross production archiving will symlink daily/tiles/archive/TILEID/ARCHIVEDATE to a guadalupe/tiles/cumulative/TILEID/LASTNIGHT directory; we will not be creating new guadalupe/tiles/archive/ directories. i.e. the archiving process is a way of freezing a cumulative/TILEID/LASTNIGHT directory, either by moving it to an archive directory (daily) or otherwise linking to a guaranteed frozen copy (e.g. guadalupe). i.e. I think you can proceed with guadalupe checksums, or otherwise I am misunderstanding the concern.

This also suggests that the layout of the tiles/archive/TILEID/ARCHIVEDATE directory is the same as or very similar to the layout of a SPECPROD/tiles/cumulative/TILEID/LASTNIGHT directory. Is this a reasonably safe assumption?

Yes, they are identical in structure. In the normal archiving case, the tiles/archive/TILEID/ARCHIVEDATE is a moved copy of files that were originally in tiles/cumulative/TILEID/LASTNIGHT, and a there is a symlink left behind in tiles/cumulative/TILEID/LASTNIGHT to the new archived location. In the case of cross production archiving, it will link directly to a tiles/cumulative/TILEID/LASTNIGHT directory. So they are by construction the same structure.

@weaverba137
Copy link
Member

@sbailey, Indeed, it's not a concern in regards to creating guadalupe checksums. To expand on the process a bit:

  1. Tile archiving should create checksums for newly-created ARCHIVEDATE directories.
  2. Some other process will need to create checksums for ARCHIVEDATE directories that already exist.
  3. That other process should skip ARCHIVEDATE directories that will be replaced by symlinks into guadalupe.

@sbailey
Copy link
Contributor Author

sbailey commented May 6, 2022

Clarifying item 3:

  1. That other process should skip ARCHIVEDATE directories that will be replaced by symlinks into guadalupe.

When we re-archive a tile linking to guadalupe, that would get a new ARCHIVEDATE so that we don't break the previous archived version that we promised not to change. i.e. we will not replace existing ARCHIVEDATEs with a link to guadalupe instead. They are archived, frozen, and never supposed to change (except getting their checksums added).

Note that ARCHIVEDATE is the date that we decided to promote a particular processing to archival status for MTL decisions; it is not the same as LASTNIGHT (the last night of data included in that particular cumulative coadd).

@weaverba137
Copy link
Member

Ah, OK. In that case the script to create checksums for pre-existing ARCHIVEDATE should just do so for all of them. Much simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants