-
Notifications
You must be signed in to change notification settings - Fork 33
Description
the two imaging dandisets are large and will continuously run into caching efficiency. giacomo’s is only about 5TB but lee’s is around 120TB and growing. any kind of bids-related rewrite could thus involve significant checksum computation overhead that could take weeks. i would say it’s time to consider efficiency of both zarr versions and large files qua local checksum computation. i would say the overall problem is to ensure that a local directory can be checksummed efficiently.
one easy way is to maintain a table of mtime+size checksum alongside a dandi-etag in the cache. a rename or a move of a file doesn’t change this checksum and can be copied even across filesystems with both of those elements maintained. thus having a table that is simply an LRU type cache would allow for local movement instead of tying it to a path name.