Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

occasional file i/o "access is denied" errors during pcap ingest in windows CI with brim #613

Closed
mikesbrown opened this issue Apr 21, 2020 · 8 comments · Fixed by #741 or #925
Closed
Assignees
Labels
bug Something isn't working

Comments

@mikesbrown
Copy link
Contributor

mikesbrown commented Apr 21, 2020

Several tests for Brim / zq integration involve uploading a pcap through Brim and then doing something with it. In one run, there were i/o errors that prevented ingest from succeeding.

One was

CreateFile
C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\
scoped_dir596_1117112332\\upload596_235124938\\
sample.pcap.brim\\.tmp.ingest\\capture_loss.log:
Access is denied.

The other was

rename C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\
scoped_dir940_1959449935\\upload940_1015111360\\
sample.pcap.brim\\all.bzng.tmp 

C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\
scoped_dir940_1959449935\\upload940_1015111360\\
sample.pcap.brim\\all.bzng: Access is denied

The bug is filed here as the error is happening in zqd and being propagated to Brim.

The file being ingested is sample.pcap from brim.git.

The logs of the run are available here.

brim was at brimdata/zui@1890b07 . zq was at tag v0.11.1. Note that windows integration testing is quite new; we don't have many past data points. But so far this is the only known instance.

@philrz philrz added the test Creating/improving test automation label Apr 22, 2020
@mikesbrown mikesbrown added bug Something isn't working and removed test Creating/improving test automation labels Apr 30, 2020
@alfred-landrum alfred-landrum self-assigned this May 1, 2020
@mikesbrown
Copy link
Contributor Author

Happened here https://github.com/brimsec/brim/runs/637035303?check_suite_focus=true . Confirmed with screen shot. The zq hash is e7b866d which is an experimental branch. No changes were made to zq itself on this branch. The parent hash from master is e40934a

@alfred-landrum
Copy link
Contributor

thanks for pointer @mikesbrown - I did pull the archive for that test run. The logs have very little data to help with the issue, I may need to add some to help triage.

@mikesbrown
Copy link
Contributor Author

https://github.com/brimsec/brim/actions/runs/95611919

Looks like this can now manifest with a user-facing message like this:

Detail: D:\var\tmp\brimsec\itest\histogram.test.js\1\data\spaces\sample.pcap.brim\.tmp.ingest\conn.log: line 1: bad format

In histogram.test.js/1/logs/zqd-core.log I still see the same story as before:

{
  "level": "warn",
  "ts": 1588625148.0924168,
  "msg": "error",
  "request_id": "4",
  "status": 500,
  "error": "CreateFile D:\\var\\tmp\\brimsec\\itest\\histogram.test.js\\1\\data\\spaces\\sample.pcap.brim\\.tmp.ingest\\capture_loss.log: Access is denied."
}

This was zq hash 335d284.

One thing I noticed is that histogram.test.js/1/data doesn't even exist. All the other data dirs, for passing tests, do.

@alfred-landrum
Copy link
Contributor

alfred-landrum commented May 7, 2020

I believe this is because the rename operation occurs while a search is active on the space. At the filesystem level, it means the rename (MoveEx in the win32 api) occurs while there is an open handle that was not called with FILE_SHARE_DELETE. It is not possible to specify that flag with the stdlib os.Open call. (There has been quite a lot of discussion about how to handle this in the Go community; see golang/go#32088 and golang/go#34681).

The current "best" way to get Posix like semantics for opens not interfering with renames is to essentially copy all of the code from stdlib involved in file opens, with the added ability to specify FILE_SHARE_DELETE. You can see an example of that in moby/moby (basis of Docker):
moby/moby@a5f237c

Alternatively, we could protect against this above the filesystem, with reader/writer protection against overwriting the all.zng file. @mattnibs is currently working on revamping the storage interface to spaces, which will centralize access to the file, where we could implement the protection scheme.

However, since this will likely not be the last time we expect Windows file access to have these semantics, I lean towards taking the copy-all-os-Open approach, and making sure that all file opens in zq go through that code going forward.

alfred-landrum added a commit that referenced this issue May 8, 2020
Address the occasional rename related failures on Windows by using the FILE_SHARE_DELETE flag when opening files on Windows. As noted in #613, this unfortunately requires copying substantial code from the os package, as there's no way to use that flag with the existing os.Open api.
@alfred-landrum
Copy link
Contributor

We've seen another instance of this after #741 was committed.

@alfred-landrum
Copy link
Contributor

Just an update: I've been trying to reproduce this with a brim/zq branch with more logging information, and I can't get to repro on that test branch. Since the new mingw Zeek artifact for windows, it also seems like the frequency of occurrence has gone down, but not decreased. Here's a recent example of it:
https://github.com/brimsec/brim/runs/721517441?check_suite_focus=true

@mikesbrown
Copy link
Contributor Author

alfred-landrum added a commit that referenced this issue Jun 22, 2020
A few enhancements in our fs package:

    add a utility function to use when atomically replacing the contents for a file. This is based on a function NewAtomicFileWriter in the https://github.com/moby/moby repo; I'll add an acknowledgements entry for the (apache licensed) code if this goes in.
    add a fs version of 'ioutil.ReadFile' that uses fs.Open to get the rename semantics we want on Windows.

This also updates the filestore code to use these & other calls from package fs. There is a possibility that this resolves #613 , though that's been impossible to repro on demand.

Part of brimdata/zui#883 .
@alfred-landrum
Copy link
Contributor

Just FYI: this bug repro'd very sporadically, so even though I think #925 addresses it, it's hard to be sure.

brim-bot pushed a commit to brimdata/zui that referenced this issue Jun 22, 2020
…y alfred-landrum

This is an auto-generated commit with a zq dependency update. The zq PR
brimdata/super#925, authored by @alfred-landrum,
has been merged.

add and use a file replacement utility function

A few enhancements in our fs package:
* add a utility function to use when atomically replacing the contents for a file. This is based on a function `NewAtomicFileWriter` in the https://github.com/moby/moby repo; I'll add an acknowledgements entry for the (apache licensed) code if this goes in.
* add a fs version of 'ioutil.ReadFile' that uses fs.Open to get the rename semantics we want on Windows.

This also updates the filestore code to use these & other calls from package fs. There is a possibility that this resolves brimdata/super#613 , though that's been impossible to repro on demand.

Part of #883 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants