This repository has been archived by the owner on Jan 21, 2024. It is now read-only.
Triox file storage #22
AaronErhardt
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is a suggestion for implementing a file storage API that Triox can build upon.
The key tasks the API should handle are the following:
Any feedback is welcome!
Storage layout
User data
User data is stored at a specific location similar the current model (e.g. in
data/users/:id
). The API should be able to access user data identified by the user id and the relative path in the user directory. All file operations should be asynchronous and directories should be readable as a stream of data in a format like zip (to allow directory downloads).A read-write lock mechanism should be used for all read and write operations. This means if one clients wants to write all other clients have to wait before performing operations themselves on the same location. This mechanism could be very coarse at first, locking per user and per share and made more fain grained in the future where locks only affect operations on the same location.
Quotas are stored with the user's information and automatically handled by the API. On each write and delete operation the API keeps track of the bytes used by a user and returns an error if the Quota is exceeded. Shares count as data of the owner and affect the owner's quota.
Shares
Shared folders and files are stored in a separate directory, e.g. in
data/shares/:id
. Each share has it's unique id and information that specifies the users and groups that are allowed to access the shared folder and with which permissions. Inside the user data shares are represented only as files that point to the id of the share and are automatically resolved when accessing relative paths through the API. Also the references to shared data are freely movable inside the users data.The API also stores the paths where users store their references to the share so that they can be easily deleted when the share is removed. Alternatively a garbage collection mechanism could delete invalid share references from time to time.
Sync
The sync mechanism is based on trees that store the state of the synchronization. Taking advice from the new Dropbox sync engine this would result in three trees:
Each user has his own sync trees. Also each shared folder has its own sync trees which are part of the sync trees of users that have access to the shared folder.
The sync tree must store uniquely identifiable information about a file's state. This can be done by storing a hash of the file path and the timestamp of the latest change (or retrieving it from the OS). Storing a hash of the files content could be beneficial as well as this allows to skip syncing moved of copied files.
Delta-sync could be archived in a lightweight manner by storing a binary diff of only the latest changes. Rare conflicts where two clients have made local changes at the same location should be deterministically solved by accepting the first client's changes and discarding the other changes (of course a sync client could notify the user before discarding changes).
Open Questions
I don't expect to the current elaboration to be complete and I hope the feedback will help in finding and resolving more question. Yet already some questions about implementation details remain open:
File encryption
(For those interested how Nextcloud handles encryption.)
General design:
User data
Each user has his own public and private key, generated when the user logs in for the first time after encryption was enabled. The private key is protected by the user's password.
Each file has its own (slightly smaller) key that is encrypted against the public key of the user. When the user uploads a file, a new key is generated, the file is encrypted with the new key. Then the key is encrypted against the user's public key and stored together with the file. If the user wants to read this file, the key of the file is encrypted with the private key of the user and then the key can be used to read the contents of the file.
Auth process
On login the private key is temporarily decrypted with the password the user submitted. Then a copy of the private key is created that is protected by a new randomly generated password. This new password is then sent as encrypted JWT claim to the client who then can use its JWT to decrypt the private key and then the files. The copy of the private key is automatically deleted by a background process when the JWT becomes invalid after a few hours.
Shared folders
Shared files also have their own keys but these keys are encrypted against the public keys of all users that are allowed to access the files. Then each user can encrypt the files using his private key.
Open Questions
Again, I don't expect to the current elaboration to be complete and I hope for feedback on this section.
Some open questions are:
Beta Was this translation helpful? Give feedback.
All reactions