Add optional getdents64(2) syscall bypass in read-only open
#14202
+164
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I am trying to use RocksDB in ReadOnly mode with a POSIX file system that does not correctly support
open(2)withO_DIRECTORYorgetdents64()syscalls. As a result, I see ReadOnly database open failures relating to the file system reporting non-zero exit code forgetdents64(2):Example strace output:
Looking at the ReadOnly open code, it seems the
getdents64(2)andopen(2)are byproducts of GetChildren(), which callsopendir(3)andreaddir(3)to range over the database directory. The practical applications of these callsites are to:OPTIONS-$Nfile forGetLiveFiles()inclusionNeither of these appears to be relevant for our use case of simple RocksDB backup -> ReadOnly restore, which makes me think they can be safely bypassed in the ReadOnly open path.
This change adds a new database option,
skip_directory_scan_on_readonly_db_open, that allows RocksDB users to omit directory listing related system calls, with a default offalse. I think generally, this is probably useful for folks using RocksDB over customized file systems (be it FUSE or NFS etc), that may only implement partial POSIX support. With this change, we are able to use successfully use RocksDB in ReadOnly mode with our customized file system.Note:
best_efforts_recoverydoes bypass the WAL dir scan, but doesn't skip the scan for the OPTIONs file. I also think usingbest_efforts_recoveryas a proxy for bypassinggetdents64(2)deviates from the options original intent.Reproduce database open failure
Example program opening database in ReadOnly mode, fails during
getdents64(2).As far as I can tell, this seems safe? But I'm looking for maintainers with better context to help me understand why this might be a bad idea. Thanks