-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Switch MetaDataStateFormat to Lucene directory abstraction #33989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done an initial pass. I'm missing a test where we inject failures at random during writes and check whether we either successfully load the previous or the newly written state (i.e. atomicity).
Files.deleteIfExists(stateFile); | ||
logger.trace("cleanupOldFiles: cleaned up {}", stateFile); | ||
private void cleanupOldFiles(final String currentStateFile, Path[] locations) throws IOException { | ||
for (Path location: locations){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space missing before :
and before {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
for (Path location: locations){ | ||
Path stateLocation = location.resolve(STATE_DIR_NAME); | ||
try (Directory stateDir = newDirectory(stateLocation)){ | ||
for (String file : stateDir.listAll()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space missing before {
(same for previous line)
Also check the rest of PR for these occurrences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip, just in case: you can let intellij auto format a selection (and not the entire file) but selecting it and choosing reformat code from the Code menu (or using whatever keyboard shortcut you have).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that, I know about format option and use it, but sometimes forget to apply it. Unfortunately, there is no good way for IDEA to do it safe and automatically for you
Files.deleteIfExists(tmpPath); | ||
logger.trace("cleaned up {}", tmpPath); | ||
} | ||
for (Path extraLocation : locations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overriding the first location again here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
final Path firstStateLocation = locations[0].resolve(STATE_DIR_NAME); | ||
try (Directory stateDir = newDirectory(firstStateLocation)) { | ||
writeStateToFirstLocation(state, stateDir, fileName, tmpFileName); | ||
logger.trace("written state to {}", firstStateLocation.resolve(fileName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move log line into writeStateToFirstLocation
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done in this way because there I did not want to pass firstStateLocation path to writeStateToFirstLocation method just for logging purposes. Unfortunately, there is no method like resolve in Lucene directory. But fixed anyway
for (Path extraLocation : locations) { | ||
final Path extraStateLocation = extraLocation.resolve(STATE_DIR_NAME); | ||
copyStateToExtraLocation(stateDir, extraStateLocation, fileName, tmpFileName); | ||
logger.trace("copied state to {}", extraStateLocation.resolve(fileName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move log line into copyStateToExtraLocation
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it was done to make symmetrical with writeStateToFirstLocation, fixed
DummyState initialState = writeAndReadStateSuccessfully(format, injectFailures, path); | ||
possibleStates.add(initialState); | ||
|
||
for (int i=0; i<randomIntBetween(1,5); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces missing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
public void eval(MockDirectoryWrapper dir) throws IOException { | ||
for (StackTraceElement e : Thread.currentThread().getStackTrace()) { | ||
if (failMethod.equals(e.getMethodName())) { | ||
throw new MockDirectoryWrapper.FakeIOException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means we always fail on the first call in this method. I wonder if will not get enough coverage by always failing on the first call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will fail on the first method call which name equals to one of randomly selected from failure methods.
For example, if you look at "testFailWriteAndReadPreviousState" you will find that failure methods are the following: "createOutput", "sync", "rename". If we run the sufficient number of test iterations, we should get good coverage. Plus we perform more than one state write in each test method, and each of this method could fail at randomly chosen method.
* @param failureMethods - one of these method calls will fail on Directory object if injectFailure is set to true. | ||
* Method that will fail is randomly selected on each newDirectory method call. | ||
*/ | ||
Format(String prefix, AtomicBoolean injectFailure, String... failureMethods) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of injecting the AtomicBoolean, it might be nicer to just create it internally, and access it on the Format
object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
server/src/test/java/org/elasticsearch/gateway/MetaDataStateFormatTests.java
Show resolved
Hide resolved
cd55bca
to
afa9a3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant adding a test that injects a failure completely at random (without matching on specific method names). There are two more logging issues.
}; | ||
// now clean up the old files | ||
for (Path dataLocation : locations) { | ||
logger.trace("cleanupOldFiles: cleaning up {}", dataLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be still missing.
for (String file : stateDir.listAll()) { | ||
if (file.startsWith(prefix) && file.equals(currentStateFile) == false) { | ||
deleteTempFile(stateLocation, stateDir, file); | ||
logger.trace("cleanupOldFiles: cleaned up {}", stateLocation.resolve(file)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already logged part of deleteTempFile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, also renamed deleteTempFile method, because when used in cleanup it's not temp file, but previous state
@ywelsch I've fixed logging issues and added the random test. Random test discovered the bug, that we don't take temp files into account when looking up max state id. I've fixed that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the extra test. I've added a few more comments.
return state; | ||
} catch (Exception e) { | ||
exceptions.add(new IOException("failed to read " + pathAndStateId.toString(), e)); | ||
exceptions.add(new IOException("failed to read " + stateFile.getFileName(), e)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have the absolute path in it, I think, not just the file name.
.filter(pathAndStateId -> pathAndStateId.id == finalMaxStateId) | ||
.collect(Collectors.toCollection(ArrayList::new)); | ||
long maxStateId = findMaxStateId(prefix, dataLocations); | ||
List<Path> stateFiles = findStateFilesByGeneration(maxStateId, dataLocations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If maxStateId > 0 and stateFiles.isEmpty() I think we should throw an exception here as it clearly indicates something went horribly wrong (e.g. concurrent writes or deletes). That same condition was previously covered by distinguishing between pathAndStateIds
and files
.
for (Path dataLocation : locations) { | ||
final Path stateFilePath = dataLocation.resolve(STATE_DIR_NAME).resolve(fileName); | ||
if (Files.exists(stateFilePath)) { | ||
files.add(stateFilePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add trace logging to this (like it was before):
logger.trace("found state file: {}", stateFilePath);
} | ||
|
||
public void testFailRandomlyAndReadAnyState() throws IOException { | ||
Path paths[] = new Path[randomIntBetween(2, 5)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also consider the typical use case of a single data path :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. You'll have to merge in the latest zen2 branch into your branch to get CI green here.
run gradle build tests please |
1 similar comment
run gradle build tests please |
run gradle build tests again |
6ceaa43
to
554537e
Compare
Switching to Lucene directory abstraction for MetaDataStateFormat to be able to inject failure during writing state to disk.
Adding new tests to MetaDataStateFormat to inject these failures.