-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconcile remote and existing inodes at update
time
#386
Conversation
To date we haven't thought too carefully about what happens if objects are put/deleted from the S3 bucket while conflicting state is present locally. There are a lot of edge cases here -- the Cartesian product of existing state (local/remote file/directory) and new remote state (file/directory), as well as two paths for inodes to be updated (readdir vs lookup). This change defines a semantics for these permutations. The overall idea is that (a) remote state shadows local state, and (b) directories shadow files. But those axioms alone aren't enough to break all ties; for example, what if the existing state is a local directory but the new state is a remote file -- which should win? I chose to break the tie by saying that remote directories > any local state > remote files. So, for example, if a user creates a local directory, and then a conflicting object appears in the remote bucket, the directory will still be visible instead of the new file. I spent some time trying to patch the existing inode update path to do what I needed but it ended up being easier to just refactor it. I think we could still find a better factoring for this path, but it now explicitly accounts for all the permutations above and does the right thing (at least according to our reference model) for them all. Happily, proptest has done a good job at rooting out the many edge cases, as you can see by all the new regression tests in this change. Signed-off-by: James Bornholt <bornholt@amazon.com>
@@ -62,7 +62,7 @@ impl Node { | |||
#[derive(Debug)] | |||
pub struct Reference { | |||
/// Contents of our S3 bucket | |||
remote_keys: Vec<(String, MockObject)>, | |||
remote_keys: BTreeMap<String, MockObject>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is just to de-duplicate keys. A HashMap
would work too, but BTreeMap
's ordering is deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! My only note is that the new behavior should be documented in the semantics doc. Pretty much what you wrote in the description of change would do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! it's nice to see we handle these edge cases properly and also be able to model them in the prop tests.
+1 on @passaro's comment that this behavior should be documented though I'm not sure where. The semantics document seems to be higher level than this, I think at least we should have it as Rust document at update_from_remote
function.
Signed-off-by: James Bornholt <bornholt@amazon.com>
Going to do a broader documentation update for this separately. |
Description of change
To date we haven't thought too carefully about what happens if objects are put/deleted from the S3 bucket while conflicting state is present locally. There are a lot of edge cases here -- the Cartesian product of existing state (local/remote file/directory) and new remote state (file/directory), as well as two paths for inodes to be updated (readdir vs lookup).
This change defines a semantics for these permutations. The overall idea is that (a) remote state shadows local state, and (b) directories shadow files. But those axioms alone aren't enough to break all ties; for example, what if the existing state is a local directory but the new state is a remote file -- which should win? I chose to break the tie by saying that remote directories > any local state > remote files. So, for example, if a user creates a local directory, and then a conflicting object appears in the remote bucket, the directory will still be visible instead of the new file.
I spent some time trying to patch the existing inode update path to do what I needed but it ended up being clearer to just refactor it. I think we could still find a better factoring for this path, but it now explicitly accounts for all the permutations above and does the right thing (at least according to our reference model) for them all. Happily, proptest has done a good job at rooting out the many edge cases, as you can see by all the new regression tests in this change.
I've tested this change by running the proptests with 100,000 test cases and didn't see any failures.
Semantics details
Here's a table of the various permutations of new and existing state for a given inode and how we reconcile them:
I think the two unclear cases here are (Local file, Directory) and (Local directory, File). For the first one, if we preferred the local file over the directory, the file would disappear from view as soon as it became remote, which seems more surprising than just making it invisible immediately. For the second one, if we preferred the remote file over the local directory,
rmdir
no longer works on the local directory, so it's not really "local" any more.The other interesting case is (Local file, File). I chose to prefer the local file because otherwise a user could read from "the same file" (actually the same path but a different file) while writing to it, which would give confusing behavior (can't read your own writes).
Fixes #128.
Does this change impact existing behavior?
Yes, this change specifies behavior for cases that previously were emergent. We've never defined the semantics for concurrent accesses before, so this change does so for the first time, and modifies some existing behaviors (mostly around remote objects changing between files and directories) to match that semantics.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).