Skip to content

Zarr commit date is AFTER the last-modified of new/pending Zarr entry #66

@yarikoptic

Description

@yarikoptic

Long story with conclusion summarized in subject line:

In attempt to provide temporary remedy for

I reran our tool with --mode force which ignored differences in metadata which happened without modifications to time stamp. Then I reran with --mode verify which still failed with

the single error for a single dandiset 001246 zarr
    | backups2datalad.util.UnexpectedChangeError: Dandiset 001246: Zarr 1c0e023e-150d-43a7-948b-dc46dd514ca6: entry '.zattrs' added, but Dandiset draft timestamp was not updated on server
    +------------------------------------
2025-02-06T10:06:12-0500 [ERROR   ] backups2datalad: An error occurred:
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.13/site-packages/backups2datalad/__main__.py", line 119, in wrapped
    await f(datasetter, *args, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.13/site-packages/backups2datalad/__main__.py", line 229, in update_from_backup
    await datasetter.update_from_backup(dandisets, exclude=exclude)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.13/site-packages/backups2datalad/datasetter.py", line 135, in update_from_backup
    raise RuntimeError(
        f"Backups for {quantify(len(report.failed), 'Dandiset')} failed"
    )
RuntimeError: Backups for 1 Dandiset failed
Logs saved to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2025.02.06.14.57.12Z.log
action summary:
  publish (notneeded: 2)
2025-02-06T10:06:12-0500 [INFO    ] con-duct: Summary:
Exit Code: 1
Command: flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron-nonzarr.lock bash -c /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron  --mode verify
Log files location: .duct/logs/2025.02.06T09.57.04-2226083_
Wall Clock Time: 547.832 sec
Memory Peak Usage (RSS): 1.4 GB
Memory Average Usage (RSS): 502.4 MB
Virtual Memory Peak Usage (VSZ): 7.7 TB
Virtual Memory Average Usage (VSZ): 1.0 TB
Memory Peak Percentage: 1.9000000000000001%
Memory Average Percentage: 0.6165427509293678%
CPU Peak Usage: 168.2%
Average CPU Usage: 53.58773234200743%

And that zarr is marked as "Complete"

❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/zarr/1c0e023e-150d-43a7-948b-dc46dd514ca6/' -H 'accept: application/json' | jq .
{
  "name": "sub-I39/micr/sub-I39_ses-01_ce-intensity_sample-occipital_chunk-034_OCT.ome.zarr",
  "dandiset": "001246",
  "zarr_id": "1c0e023e-150d-43a7-948b-dc46dd514ca6",
  "status": "Complete",
  "checksum": "09639cfd30d04838239fa0519e067916-478--307209961",
  "file_count": 478,
  "size": 307209961
}

The timestamp for the draft version of the dandiset is 2024-10-29T19:39:37.231865Z

❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/dandisets/001246/versions/' -H 'accept: application/json' | jq . | ~/proj/CON/utils/bin/show-paths -e modified -f full-lines
4    "results": [
13:       "modified": "2024-10-29T19:39:37.231865Z",
14        "dandiset": {
17:         "modified": "2024-10-29T19:17:15.901343Z",

and for that .zattrs

❯ wget -S 'https://dandiarchive.s3.amazonaws.com/zarr/1c0e023e-150d-43a7-948b-dc46dd514ca6/.zattrs?versionId=ls2qFhNsIojrQsi_2J0Udc72EEuKzsS9' 2>&1 | grep Last
  Last-Modified: Tue, 29 Oct 2024 19:27:41 GMT

and we have

dandi@drogon:/mnt/backup/dandi/dandisets/001246$ cat .dandi/assets-state.json 
{
    "timestamp": "2024-10-29T19:39:37.231865Z"
}
dandi@drogon:/mnt/backup/dandi/dandisets/001246$ git show | grep Date
Date:   2024 Oct 29 15:39:37 -0400

and in the zarr

dandi@drogon:/mnt/backup/dandi/dandizarrs/1c0e023e-150d-43a7-948b-dc46dd514ca6$ git show | grep Date
Date:   2024 Oct 29 15:28:10 -0400
dandi@drogon:/mnt/backup/dandi/dandizarrs/1c0e023e-150d-43a7-948b-dc46dd514ca6$ ls -ld .zattrs
ls: cannot access '.zattrs': No such file or directory

and that .zattrs has

dandi@drogon:/mnt/backup/dandi/dandizarrs/1c0e023e-150d-43a7-948b-dc46dd514ca6$ wget -O /dev/null -S 'https://dandiarchive.s3.amazonaws.com/zarr/1c0e023e-150d-43a7-948b-dc46dd514ca6/.zattrs?versionId=ls2qFhNsIojrQsi_2J0Udc72EEuKzsS9' 2>&1 | grep Last-Mo
  Last-Modified: Tue, 29 Oct 2024 19:27:41 GMT

So, we have a commit time of 28:10 minutes, but the last-modified of the .zattrs (which was not yet added to that commit) is 27:41 minutes. How could that happen? How exactly do we decide on commit date for zarrs - is it based on last-modified?

I could hypothesize/propose:

  • upload uses local to file last-modified, so user's laptop had date wrong and later uploaded file has last-modified back in time... that would mean we cannot rely on last-modified
  • may be while we were listing zarr, new files being uploaded, so depending on the listing we might catch some newer keys/versions but not unfortunate missed older ones?
  • should we then add some guarding that if any last-modified is "too close to now" (e.g. within time it takes to list), do not act on it, and sleep that amount of time needed to list, redo listing from beginning (while observing the same guarding)?

wdyt @jwodder?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions