Skip to content

Switch MetaDataStateFormat to Lucene directory abstraction #33989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 17, 2018

Conversation

andrershov
Copy link
Contributor

Switching to Lucene directory abstraction for MetaDataStateFormat to be able to inject failure during writing state to disk.
Adding new tests to MetaDataStateFormat to inject these failures.

@andrershov andrershov added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Sep 24, 2018
@andrershov andrershov requested a review from ywelsch September 24, 2018 10:38
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done an initial pass. I'm missing a test where we inject failures at random during writes and check whether we either successfully load the previous or the newly written state (i.e. atomicity).

Files.deleteIfExists(stateFile);
logger.trace("cleanupOldFiles: cleaned up {}", stateFile);
private void cleanupOldFiles(final String currentStateFile, Path[] locations) throws IOException {
for (Path location: locations){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space missing before : and before {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

for (Path location: locations){
Path stateLocation = location.resolve(STATE_DIR_NAME);
try (Directory stateDir = newDirectory(stateLocation)){
for (String file : stateDir.listAll()){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space missing before { (same for previous line)
Also check the rest of PR for these occurrences

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip, just in case: you can let intellij auto format a selection (and not the entire file) but selecting it and choosing reformat code from the Code menu (or using whatever keyboard shortcut you have).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, I know about format option and use it, but sometimes forget to apply it. Unfortunately, there is no good way for IDEA to do it safe and automatically for you

Files.deleteIfExists(tmpPath);
logger.trace("cleaned up {}", tmpPath);
}
for (Path extraLocation : locations) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overriding the first location again here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

final Path firstStateLocation = locations[0].resolve(STATE_DIR_NAME);
try (Directory stateDir = newDirectory(firstStateLocation)) {
writeStateToFirstLocation(state, stateDir, fileName, tmpFileName);
logger.trace("written state to {}", firstStateLocation.resolve(fileName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move log line into writeStateToFirstLocation method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done in this way because there I did not want to pass firstStateLocation path to writeStateToFirstLocation method just for logging purposes. Unfortunately, there is no method like resolve in Lucene directory. But fixed anyway

for (Path extraLocation : locations) {
final Path extraStateLocation = extraLocation.resolve(STATE_DIR_NAME);
copyStateToExtraLocation(stateDir, extraStateLocation, fileName, tmpFileName);
logger.trace("copied state to {}", extraStateLocation.resolve(fileName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move log line into copyStateToExtraLocation method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it was done to make symmetrical with writeStateToFirstLocation, fixed

DummyState initialState = writeAndReadStateSuccessfully(format, injectFailures, path);
possibleStates.add(initialState);

for (int i=0; i<randomIntBetween(1,5); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces missing...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

public void eval(MockDirectoryWrapper dir) throws IOException {
for (StackTraceElement e : Thread.currentThread().getStackTrace()) {
if (failMethod.equals(e.getMethodName())) {
throw new MockDirectoryWrapper.FakeIOException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means we always fail on the first call in this method. I wonder if will not get enough coverage by always failing on the first call

Copy link
Contributor Author

@andrershov andrershov Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will fail on the first method call which name equals to one of randomly selected from failure methods.
For example, if you look at "testFailWriteAndReadPreviousState" you will find that failure methods are the following: "createOutput", "sync", "rename". If we run the sufficient number of test iterations, we should get good coverage. Plus we perform more than one state write in each test method, and each of this method could fail at randomly chosen method.

* @param failureMethods - one of these method calls will fail on Directory object if injectFailure is set to true.
* Method that will fail is randomly selected on each newDirectory method call.
*/
Format(String prefix, AtomicBoolean injectFailure, String... failureMethods) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of injecting the AtomicBoolean, it might be nicer to just create it internally, and access it on the Format object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@andrershov
Copy link
Contributor Author

@ywelsch thanks for the review! I've made necessary changes in afa9a3b. Please let me know if something else is needed.
By injecting failures during writes did you mean, that we also should test for failures when writing bytes of the temporary file to disk? If yes, I've addressed it.

@ywelsch ywelsch changed the title Switch MetaDataStateFormat Lucene directory abstraction Switch MetaDataStateFormat to Lucene directory abstraction Sep 27, 2018
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant adding a test that injects a failure completely at random (without matching on specific method names). There are two more logging issues.

};
// now clean up the old files
for (Path dataLocation : locations) {
logger.trace("cleanupOldFiles: cleaning up {}", dataLocation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be still missing.

for (String file : stateDir.listAll()) {
if (file.startsWith(prefix) && file.equals(currentStateFile) == false) {
deleteTempFile(stateLocation, stateDir, file);
logger.trace("cleanupOldFiles: cleaned up {}", stateLocation.resolve(file));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already logged part of deleteTempFile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, also renamed deleteTempFile method, because when used in cleanup it's not temp file, but previous state

@andrershov
Copy link
Contributor Author

@ywelsch I've fixed logging issues and added the random test. Random test discovered the bug, that we don't take temp files into account when looking up max state id. I've fixed that.
Also, I've refactored loadLatestState method to remove code duplication.
May I ask you to kindly perform one more code review pass?

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the extra test. I've added a few more comments.

return state;
} catch (Exception e) {
exceptions.add(new IOException("failed to read " + pathAndStateId.toString(), e));
exceptions.add(new IOException("failed to read " + stateFile.getFileName(), e));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have the absolute path in it, I think, not just the file name.

.filter(pathAndStateId -> pathAndStateId.id == finalMaxStateId)
.collect(Collectors.toCollection(ArrayList::new));
long maxStateId = findMaxStateId(prefix, dataLocations);
List<Path> stateFiles = findStateFilesByGeneration(maxStateId, dataLocations);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If maxStateId > 0 and stateFiles.isEmpty() I think we should throw an exception here as it clearly indicates something went horribly wrong (e.g. concurrent writes or deletes). That same condition was previously covered by distinguishing between pathAndStateIds and files.

for (Path dataLocation : locations) {
final Path stateFilePath = dataLocation.resolve(STATE_DIR_NAME).resolve(fileName);
if (Files.exists(stateFilePath)) {
files.add(stateFilePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add trace logging to this (like it was before):

logger.trace("found state file: {}", stateFilePath);

}

public void testFailRandomlyAndReadAnyState() throws IOException {
Path paths[] = new Path[randomIntBetween(2, 5)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also consider the typical use case of a single data path :)

@andrershov
Copy link
Contributor Author

@ywelsch thank you for another round, I've made changes in this commit 0f33a39

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. You'll have to merge in the latest zen2 branch into your branch to get CI green here.

@andrershov
Copy link
Contributor Author

run gradle build tests please

1 similar comment
@andrershov
Copy link
Contributor Author

run gradle build tests please

@andrershov
Copy link
Contributor Author

run gradle build tests again

@andrershov andrershov mentioned this pull request Oct 17, 2018
6 tasks
@andrershov andrershov merged commit 51f38dd into elastic:zen2 Oct 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants