Skip to content

Doesn't update superdataset properly any more. backups2datalad process is "stuck" #39

@yarikoptic

Description

@yarikoptic

I noticed that https://github.com/dandi/dandisets says that last commit was 3 weeks ago. But we have lots of recently updated dandisets:
image

Individual dandiset github repos seems also have recent updates

image

on drogon I see only some not pushed (why?) updates to 000026 in dandisets superdataset
dandi@drogon:/mnt/backup/dandi/dandisets$ git log github/draft..draft
commit a60c7bf88d82a350f538bd83594c163a3b29e51e (HEAD -> draft)
Author: DANDI Team <team@dandiarchive.org>
Date:   Wed Mar 27 03:25:33 2024 -0400

    1 updated (000026)
    
    000026:
     - [backups2datalad] 22 files added, 10 files updated

commit df1dce3a6fe1aa954dd818ce54420ab31b07e71d
Author: DANDI Team <team@dandiarchive.org>
Date:   Tue Mar 26 04:10:35 2024 -0400

    1 updated (000026)
    
    000026:
     - [backups2datalad] 229 files added, 29 files updated
and git status shows that MANY dandisets have uncommitted changes (new commits OR modified content!!) on drogon
dandi@drogon:/mnt/backup/dandi/dandisets$ git status
Refresh index: 100% (612/612), done.
On branch draft
Your branch is ahead of 'github/draft' by 2 commits.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
  (commit or discard the untracked or modified content in submodules)
        modified:   000018 (new commits)
        modified:   000024 (new commits)
        modified:   000029 (modified content)
        modified:   000030 (new commits)
        modified:   000031 (new commits)
        modified:   000032 (new commits)
        modified:   000033 (new commits)
        modified:   000038 (new commits)
        modified:   000040 (new commits)
        modified:   000042 (new commits)
...

so most likely recent changes on handling embargoed/public dandisets broke this...
Then I looked at ps on drogon -- and apparently there is still a process which runs in --verify mode since Mar 07!

As such it seems to be stuck:

dandi    2645263  0.0  0.0   2484   404 ?        Ss   Mar07   0:00     /bin/sh -c chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron-nonzarr.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify'
dandi    2645264  0.0  0.0  19748  5512 ?        S    Mar07   7:09       /usr/bin/perl /usr/bin/chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron-nonzarr.lock bash -c /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify
dandi    2645317  0.0  0.0   5548   544 ?        S    Mar07   0:00         /usr/bin/flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron-nonzarr.lock bash -c /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify
dandi    2645318  0.0  0.0   6952  2536 ?        S    Mar07   0:00           /bin/bash /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify
dandi    2650110  0.0  0.0   6952  1240 ?        S    Mar07   0:00             /bin/bash /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify
dandi    2650139  0.0  0.3 1790364 200180 ?      Sl   Mar07   1:21               /home/dandi/miniconda3/envs/dandisets-2/bin/python3.1 /home/dandi/miniconda3/envs/dandisets-2/bin/backups2datalad -l WARNING --backup-root /mnt/backup/dandi --config tools/backups2datalad.cfg.yaml update-from-backup --workers 5 -e 000(026|108|243)$ --mode verify
dandi    2650140  0.0  0.0   6384  2272 ?        S    Mar07   0:00               grep -v nothing to save, working tree clean

py-spy'ing that process gives no useful detail

image

then I recalled that we have logs! and this is what I see

(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandisets$ tail -f .git/dandi/backups2datalad/2024.03.07.06.50.32Z.log
2024-03-29T03:12:15-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000729/versions/draft/ in 4292.823660 seconds as it raised PoolTimeout: 
2024-03-29T03:34:24-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000726/versions/draft/ in 8515.649940 seconds as it raised PoolTimeout: 
2024-03-29T04:12:28-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000727/versions/draft/ in 7882.410218 seconds as it raised PoolTimeout: 
2024-03-29T04:19:15-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000728/versions/draft/ in 8402.162677 seconds as it raised PoolTimeout: 
2024-03-29T04:24:48-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000729/versions/draft/ in 8413.961004 seconds as it raised PoolTimeout: 
2024-03-29T05:34:07-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000725/versions/draft/ in 15890.092814 seconds as it raised PoolTimeout: 
2024-03-29T05:57:19-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000726/versions/draft/ in 17122.589867 seconds as it raised PoolTimeout: 
2024-03-29T06:24:50-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000727/versions/draft/ in 17075.343311 seconds as it raised PoolTimeout: 
2024-03-29T06:40:18-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000728/versions/draft/ in 16793.199329 seconds as it raised PoolTimeout: 
2024-03-29T06:46:02-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000729/versions/draft/ in 17087.073695 seconds as it raised PoolTimeout: 

some seems to be open (000726) and some embargoed (000729) but even for embargoed endpoint seems to work just fine:

❯ curl -X 'GET' 'https://api.dandiarchive.org/api/dandisets/000729/versions/draft/' -H 'accept: application/json'
{"detail":"Authentication credentials were not provided."}%  

so seems like a bug somewhere.

Metadata

Metadata

Assignees

Labels

bug:stall reportReport of a backup2datalad run getting stuck or stalling

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions