You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Discussed with @nitrosx and here are our current thoughts.
Metadata
Easy: make a PATCH request with the dataset and the pid.
Files
Files added to local dataset
Can be detected based on local/remote paths of File.
Upload the new files with new datablocks.
Local files modified
Ultimate source of truth for modification: checksum. But slow. We can store the download time in the File object and use it to check whether the file has been modified since download. But it is possible to accidentally change the time without modifying the file by, e.g., touch or ctrl+s in and editor without modifications. So if the time does not match, compute checksum to be sure.
Never update remote files. If any local file has been modified, reject the update and direct the user to create a new dataset (and possibly link to the unmodified files on remote to avoid duplicating them).
Local files removed
Should not be possible in Dataset with public API. But it if happens or if the file was removed from disk, raise an error.
Note on making new datasets to update
The above relies on users first downloading a dataset (and files), modifying it, and uploading the modified version. This way, Scitacean can track ids, paths, modification times, etc.
But it is also possible to make a new dataset from scratch, assign an existing PID, and use it to upload. (Assigning a PID is not straightforward but possible.) In this case, we cannot track the above properties. Is this an issue or can we treat this like the above case?
The text was updated successfully, but these errors were encountered:
Scitacean currently cannot update datasets.
Discussed with @nitrosx and here are our current thoughts.
Metadata
Easy: make a PATCH request with the dataset and the pid.
Files
Files added to local dataset
Can be detected based on local/remote paths of
File
.Upload the new files with new datablocks.
Local files modified
Ultimate source of truth for modification: checksum. But slow. We can store the download time in the
File
object and use it to check whether the file has been modified since download. But it is possible to accidentally change the time without modifying the file by, e.g.,touch
orctrl+s
in and editor without modifications. So if the time does not match, compute checksum to be sure.Never update remote files. If any local file has been modified, reject the update and direct the user to create a new dataset (and possibly link to the unmodified files on remote to avoid duplicating them).
Local files removed
Should not be possible in
Dataset
with public API. But it if happens or if the file was removed from disk, raise an error.Note on making new datasets to update
The above relies on users first downloading a dataset (and files), modifying it, and uploading the modified version. This way, Scitacean can track ids, paths, modification times, etc.
But it is also possible to make a new dataset from scratch, assign an existing PID, and use it to upload. (Assigning a PID is not straightforward but possible.) In this case, we cannot track the above properties. Is this an issue or can we treat this like the above case?
The text was updated successfully, but these errors were encountered: