Skip to content

more efficient caching #848

@satra

Description

@satra

the two imaging dandisets are large and will continuously run into caching efficiency. giacomo’s is only about 5TB but lee’s is around 120TB and growing. any kind of bids-related rewrite could thus involve significant checksum computation overhead that could take weeks. i would say it’s time to consider efficiency of both zarr versions and large files qua local checksum computation. i would say the overall problem is to ensure that a local directory can be checksummed efficiently.

one easy way is to maintain a table of mtime+size checksum alongside a dandi-etag in the cache. a rename or a move of a file doesn’t change this checksum and can be copied even across filesystems with both of those elements maintained. thus having a table that is simply an LRU type cache would allow for local movement instead of tying it to a path name.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions