Skip to content

Update datasets #181

Open
Open
@jl-wynen

Description

@jl-wynen

Scitacean currently cannot update datasets.

Discussed with @nitrosx and here are our current thoughts.

Metadata

Easy: make a PATCH request with the dataset and the pid.

Files

Files added to local dataset

Can be detected based on local/remote paths of File.
Upload the new files with new datablocks.

Local files modified

Ultimate source of truth for modification: checksum. But slow. We can store the download time in the File object and use it to check whether the file has been modified since download. But it is possible to accidentally change the time without modifying the file by, e.g., touch or ctrl+s in and editor without modifications. So if the time does not match, compute checksum to be sure.

Never update remote files. If any local file has been modified, reject the update and direct the user to create a new dataset (and possibly link to the unmodified files on remote to avoid duplicating them).

Local files removed

Should not be possible in Dataset with public API. But it if happens or if the file was removed from disk, raise an error.

Note on making new datasets to update

The above relies on users first downloading a dataset (and files), modifying it, and uploading the modified version. This way, Scitacean can track ids, paths, modification times, etc.
But it is also possible to make a new dataset from scratch, assign an existing PID, and use it to upload. (Assigning a PID is not straightforward but possible.) In this case, we cannot track the above properties. Is this an issue or can we treat this like the above case?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions