Skip to content

zarr upload: do not pre-digest anything #915

Closed
@yarikoptic

Description

A distilled variant of #903, which if decided to proceed with could take precedence over improving any related code (#913 for more efficient digesting of zarr folders; and #914 - disabling fscaching of individual files in zarr):

  • disable digesting of zarr folders in upload -- always assume that they differ
  • do not digest individual files unless really needed -- remote file is present, the same size, so its ETag is the only way to say for sure if it is the same
  • for now keep digesting for providing digests to mint upload urls for the batch. But I am afraid we better disable fscaching of those digests (thus in part do not fscache individual files digests for zarr-checksum #914) since seems to provide too notable impact
    • ideally there should be a separate process/thread which would take care about pre-digesting files which might be missing digests for the next batch. This way we could minimize delay between batches
    • md5 digest should be computed in-band while uploading and compared against ETag for the file whenever done
  • overall zarr archive checksum'ing on the server will be disabled, so we should then stop checking if our upload of the zarr is consistent with the remote upon completion.

Is that about right @satra, are we disabling checksumming of zarrs on dandi-api during upload @dchiquito ?

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions