-
Notifications
You must be signed in to change notification settings - Fork 16
Description
The current implementation of AzureBlobStorageDataStore does not seem like it is safe to multiple instances reading and writing from the underlying blob where it stores its data. It's seems to have problems on multiple levels...
Singleton, but not thread safe
Within the app it is treated as a singleton which means it's possible that multiple threads could be accessing the class at the same time. The class in general is not hardened for thread safety so that could be a problem for starters.
Multi-instance definitely not safe
The real problem comes if the app scaled out and there were multiple instances of it running it would definitely have problems simply because there are no good concurrency checks between the point when the blob was read by the Cache method and when it's potentially written later by the Commit method.
Consider the following scenario:
- Instance 1 does a read of the blob
- Instance 2 does a read of the blob
- Instance 1 calls methods that mutate the data in memory
- Instance 2 calls methods that mutate the data in memory
- Instance 1 commits
- Instance 2 commits
The current implementation will always allow for Instance 2 to stomp on Instance 1's changes.
This is due to the fact that the ETag is not recorded by the implementation at the time of the read. So while the Commit method does get the ETag right before it writes to make sure it's writing over whatever version of the data was there in that instance, it doesn't mean it is the version of the data that was read by the Cache method and subsequently mutated.
Possible solutions
The potential fix for this is to remember the ETag at read time in the Cache method and then use that value at write time in Commit. However, this would present a problem that does not exist today because the implementation just allows clobbering and that is: what happens if you try to Commit, but another instance has updated the underlying data? Like any concurrency error, the only way to fix that would have to have ripples all the way upstream to the logic that was attempting to change the data in the first place so it could refresh the data it was working with an attempt to apply the same logical changes again, but even that may not be possible and then it has to propagate all the way back to the origin of the operation.
Now, it's possible that the implementation could act at the more logical level and detect non-conflicting changes to the overall JSON, but then that might actually make the case that it shouldn't be stored in a single JSON blob any way. That way you wouldn't have to load/cache the whole thing constantly as one and you also wouldn't have to write that logic because you could just rely on a per blob ETag instead. That would, of course, require a bigger change.