fix(posixfs): make blobstore uploads atomic #387
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem description
The current posixfs blobstore implementation relies on buffered writes from the upload to the destination file, that destination file is read (can) (for example by some PROPFIND inside another routine) while writing.
The problem starts to arise as soon as other concurrent ops rely on the etag (etag = mtime + id)!
For example if a PROFIND resolves that file while it gets written (different go routine), the etag will differ from the initial upload response header etag!
As soon as the file write is done, the etag will be the same again (from tus-POST response header).
the error rate depends on the OC environment, it happens more often if the file is large or PP is enabled.
But also if the file is written synchronously. It depends on the timing, the request count, and the file size.
Fallout
Having a different etag within the PROPFIND response tells the client that some data has changed on the remote.
This accidentally incorrect etag instructs the client to download the file again, which can have serious consequences from many points of view, such as:
The file can only be downloaded if all binary data has been written (otherwise, the download request fails), but timing is crucial here too.
If the PROPFIND returns a different etag after the upload, but all bytes have been written in the meantime, and post-processing is complete, the aforementioned conditions may occur.
Issues
Reproduce and validation
reproducing things that rely on timing is not easy, specially if multiple routines and users are involved.
For that reason i wrote some k6 tests to reproduce the behavior reliable!
K6-CDPERF (i will create a PR for that test too)
Solution
The PosixFS blobstore must ensure that all bytes are made available atomically during the upload, and only when everything is complete and ready!
So I'm now writing everything to a .tmp file, and as soon as everything is complete, the mtime and extended attributes are transferred from the placeholder to this .tmp file.
Once all this has been done, the placeholder is replaced with
os.Rename. Since we don't support Windows, this is the best solution in my opinion.After all that the tests are green (testet for ~800 uploads and 10 concurrent users)
Before Fix (1 minute, 10 users, 41 failures, 838 completions, 8.4 GB)
k6 run packages/k6-tests/artifacts/oc-tus-upload-propfind-etag-default-k6.js --duration=1m --vus=10 /\ Grafana /‾‾/ /\ / \ |\ __ / / / \/ \ | |/ / / ‾‾\ / \ | ( | (‾) | / __________ \ |_|\_\ \_____/ execution: local script: packages/k6-tests/artifacts/oc-tus-upload-propfind-etag-default-k6.js output: - scenarios: (100.00%) 1 scenario, 10 max VUs, 1m30s max duration (incl. graceful stop): * default: 10 looping VUs for 1m0s (gracefulStop: 30s) █ TOTAL RESULTS checks_total.......................: 73622 1212.068429/s checks_succeeded...................: 99.94% 73581 out of 73622 checks_failed......................: 0.05% 41 out of 73622 ✓ authn -> logonResponse - status ✓ authn -> authorizeResponse - status ✓ authn -> accessTokenResponse - status ✓ client -> role.getMyDrives - status ✓ client -> resource.uploadResource - status ✓ client -> resource.getResourceProperties - status ✓ upload.count match ✓ upload.fileid exists ✗ upload.id.etag match ↳ 99% — ✓ 35493 / ✗ 41 HTTP http_req_duration.......................................................: avg=84.09ms min=10.52ms med=74.75ms max=996.62ms p(90)=145.83ms p(95)=174.24ms { expected_response:true }............................................: avg=84.09ms min=10.52ms med=74.75ms max=996.62ms p(90)=145.83ms p(95)=174.24ms http_req_failed.........................................................: 0.00% 0 out of 1716 http_reqs...............................................................: 1716 28.251194/s EXECUTION iteration_duration......................................................: avg=721.74ms min=164.87ms med=669.58ms max=2.77s p(90)=1.11s p(95)=1.29s iterations..............................................................: 838 13.796329/s vus.....................................................................: 10 min=10 max=10 vus_max.................................................................: 10 min=10 max=10 NETWORK data_received...........................................................: 58 MB 958 kB/s data_sent...............................................................: 8.4 GB 138 MB/s running (1m00.7s), 00/10 VUs, 838 complete and 0 interrupted iterations default ✓ [======================================] 10 VUs 1m0sAfter Fix (1 minute, 10 users, 0 failures, 871 completions, 8.7 GB)
k6 run packages/k6-tests/artifacts/oc-tus-upload-propfind-etag-default-k6.js --duration=1m --vus=10 /\ Grafana /‾‾/ /\ / \ |\ __ / / / \/ \ | |/ / / ‾‾\ / \ | ( | (‾) | / __________ \ |_|\_\ \_____/ execution: local script: packages/k6-tests/artifacts/oc-tus-upload-propfind-etag-default-k6.js output: - scenarios: (100.00%) 1 scenario, 10 max VUs, 1m30s max duration (incl. graceful stop): * default: 10 looping VUs for 1m0s (gracefulStop: 30s) █ TOTAL RESULTS checks_total.......................: 79393 1303.024585/s checks_succeeded...................: 100.00% 79393 out of 79393 checks_failed......................: 0.00% 0 out of 79393 ✓ authn -> logonResponse - status ✓ authn -> authorizeResponse - status ✓ authn -> accessTokenResponse - status ✓ client -> role.getMyDrives - status ✓ client -> resource.uploadResource - status ✓ client -> resource.getResourceProperties - status ✓ upload.count match ✓ upload.fileid exists ✓ upload.id.etag match HTTP http_req_duration.......................................................: avg=82.81ms min=8.18ms med=75.94ms max=1.27s p(90)=141ms p(95)=162.3ms { expected_response:true }............................................: avg=82.81ms min=8.18ms med=75.94ms max=1.27s p(90)=141ms p(95)=162.3ms http_req_failed.........................................................: 0.00% 0 out of 1782 http_reqs...............................................................: 1782 29.246783/s EXECUTION iteration_duration......................................................: avg=697.99ms min=160.18ms med=641.7ms max=3.16s p(90)=1.13s p(95)=1.22s iterations..............................................................: 871 14.295145/s vus.....................................................................: 6 min=6 max=10 vus_max.................................................................: 10 min=10 max=10 NETWORK data_received...........................................................: 63 MB 1.0 MB/s data_sent...............................................................: 8.7 GB 143 MB/s running (1m00.9s), 00/10 VUs, 871 complete and 0 interrupted iterations default ✓ [======================================] 10 VUs 1m0s