Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix many flaky tests due to non-determinism of hash map #1218

Closed
wants to merge 6 commits into from
Closed

Fix many flaky tests due to non-determinism of hash map #1218

wants to merge 6 commits into from

Conversation

Auzel
Copy link
Contributor

@Auzel Auzel commented Nov 10, 2021

Following up on our earlier conversation, I have found the following flaky tests in this project:

edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testNegatedAttribute edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testComplicatedGraph edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testNotEquals edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testTwoGraphs edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testSimpleRequest edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testSingleMultiRequest edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testTwoSemgrex edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructureTest.doTest edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testEqualsRelation edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testDoubleMultiRequest edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testAttributeConjunction edu.stanford.nlp.naturalli.RelationTripleSegmenterTest.testAmericanActorChrisPratt edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testUnclosedMultiRequest

All but edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testNegatedAttribute and edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testAttributeConjunction assume that the iterator of the HashMap returns the values in the order they were stored. However, this is not true in java since values are returned in a non-deterministic order.

My fix is to let the underlying data structure be a LinkedHashMap ,as opposed to a HashMap. This will ensure that the values are returned in the expected order. The performance differences of the two data structures are negligible but the change to LinkedHashMap will eliminate 10 flaky tests.

@AngledLuffa
Copy link
Contributor

I understand that conceptually there's no guarantee a java map iterates items in the same order they were inserted, but I have never seen this cause errors in the tests in practice. That's on multiple architectures and JVMs. I'd be very curious to know what setup you are using which is causing this to happen.

At any rate, without extensively testing the difference in memory & speed performance, I'm not up for switching HashMap -> LinkedHashMap. I'd be happy to integrate changes to the tests themselves which eliminate the flakiness, though.

@Auzel
Copy link
Contributor Author

Auzel commented Nov 27, 2021

Yes you are correct that you nor I may be able to reproduce this error on demand. However, since it can occur at any point in time (running on a particular JVM), I try to fix these flaky tests before it comes a bigger issue. I use a tool call NONDEX to detect such tests; but I have not yet tried running the code on different JVMs to reproduce the error.

With regards to the 10 tests which assumes the iterator returns the elements in an order, I don't think it's possible to change the tests alone to make them non-flaky. More specifically, although I can, in principle, simply check that the list contains a particular string (rather than assuming the string exists at a particular index), it will not be a sufficient test of the underlying methods. The underlying methods (or code) relies on a specific order (which isn't guaranteed with a HashMap); hence my suggestion to use a LinkedHashMap.

The other two test methods (testNegatedAttribute, and testAttributeConjunction) I should be able to fix with changes solely in the test code.

@Auzel
Copy link
Contributor Author

Auzel commented Nov 27, 2021

Upon further investigation, both testNegatedAttribute and testAttributeConjunction relies on comparePatternToString(..) in SemgrexTest test class. This method compiles the pattern and then checks if the original pattern matches the output pattern. The issue is that NodeAttributes is backed by a HashMap so the output pattern of the string (after executing SemgrexPattern.compile(pattern)) can have a different pattern, as shown using the Nondex tool. More specifically this output pattern is similar to the old pattern but with some substrings in a different order. Since these tests are supposed to check that the string pattern are exactly identical (not just contain same substrings), I cannot fix these flaky tests with just changes to the test suite. It'll require a similar change to my comment earlier but in this case instantiating a LinkedHashMap for the attributes field of a NodeAttributes object.

I understand that you're not prepared to make that adjustment but I've made these suggestions since this issue can show its 'ugly head' at any point in time or upon a change in JVM — which has happened many times on other projects.

@AngledLuffa
Copy link
Contributor

I'm totally fine with making a smaller slice of the codebase use a LinkedHashMap.  I just don't want Generics.newHashMap() to change without understanding the memory and speed implications of that.  Semgrex expressions themselves are far more expensive than the map itself, so there's no reason not to make that change in that module.  (Compare this with the conparser's use of maps for features, for example, where the map is on the critical path and is the main use of memory.)  

Also, I think I understand now why this issue has never come up before.  It appears that having two attributes at once on a node is not part of the unit test.  Oops!

Thanks for your persistence - the extra bit of information here really helped narrow down the issue.

#1224

@Auzel
Copy link
Contributor Author

Auzel commented Nov 29, 2021

Oh, I see and I fully understand your concern. Yes, I should have made the change to the submodule instead, as to prevent any unintended consequences. Thanks for pointing that out. Glad everything is solved now.

@AngledLuffa
Copy link
Contributor

AngledLuffa commented Nov 29, 2021 via email

@Auzel
Copy link
Contributor Author

Auzel commented Dec 2, 2021

Okay, so I took a closer look. Your fix solved the last two problems:
edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testNegatedAttribute edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testAttributeConjunction

My new fix fixes all but edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructureTest.doTest. There are two points to note:

  1. SemgrexPattern.java has multiple matcher() methods. However, I only made a change to one of them because that was the only one exposed by the unit tests already written. It may be necessary to apply that change to the other methods to future proof it against future flaky tests (implementation-dependent).

  2. As presently constructed, this PR should not be merged. Initially, the edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructureTest.doTest flaky test was only exposed when I ran the NONDEX tool. After applying the new fixes, this flaky test becomes non-flaky but always fails. In both cases (NONDEX and mvn test), the output of the failed test was doTest[238](edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructureTest): Unexpected CC processed dependencies for tree (ROOT (S (NP (PRP$ His) (NN term)) (VP (VP (VBZ has) (VP (VBN produced) (NP (NP (DT no) (JJ spectacular) (NNS failures)) (PP (PP (IN in) (NP (NNS politics))) (, ,) (PP (IN in) (NP (DT the) (NN economy))) (CC or) (PP (IN on) (NP (DT the) (JJ military) (NN front))))))) (, ,) (CC and) (VP (VBZ has) (VP (VBN chalked) (PRT (RP up)) (NP (DT some) (NNS successes))))) (. .))) expected:<...(..) doTest[242](edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructureTest): Unexpected CC processed dependencies for tree (ROOT (S (NP (NNP Jill)) (VP (VBD walked) (PP (PP (IN out) (NP (DT the) (NN door))) (, ,) (PP (IN over) (NP (DT the) (NN road))) (, ,) (PP (IN across) (NP (DT the) (JJ deserted) (NN block))) (, ,) (PP (IN around) (NP (DT the) (NN corner))) (, ,) (CC and) (PP (IN through) (NP (DT the) (NN park))))) (. .))) expected:<...)(..) . I include this hoping you can provide some insight on this error and what may be causing it (because it isn't entirely clear to me). Once that's done, I can make the necessary changes. And so, all the implementation-dependent flaky tests will be fixed, and the PR will be ready for merging.

README.md Outdated
@@ -40,7 +40,7 @@ At present, [the current released version of the code](https://stanfordnlp.githu
1. Make sure you have Maven installed, details here: [https://maven.apache.org/](https://maven.apache.org/)
2. If you run this command in the CoreNLP directory: `mvn package` , it should run the tests and build this jar file: `CoreNLP/target/stanford-corenlp-4.3.2.jar`
3. When using the latest version of the code make sure to download the latest versions of the [corenlp-models](http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar), [english-extra-models](http://nlp.stanford.edu/software/stanford-english-extra-corenlp-models-current.jar), and [english-kbp-models](http://nlp.stanford.edu/software/stanford-english-kbp-corenlp-models-current.jar) and include them in your CLASSPATH. If you are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in.
4. If you want to use Stanford CoreNLP as part of a Maven project you need to install the models jars into your Maven repository. Below is a sample command for installing the Spanish models jar. For other languages just change the language name in the command. To install `stanford-corenlp-models-current.jar` you will need to set `-Dclassifier=models`. Here is the sample command for Spanish: `mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=4.2.2 -Dclassifier=models-spanish -Dpackaging=jar`
4. If you want to use Stanford CoreNLP as part of a Maven project you need to install the models jars into your Maven repository. Below is a sample command for installing the Spanish models jar. For other languages just change the language name in the command. To install `stanford-corenlp-models-current.jar` you will need to set `-Dclassifier=models`. Here is the sample command for Spanish: `mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=4.3.2 -Dclassifier=models-spanish -Dpackaging=jar`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@J38 has already done this in both main & dev

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm this was a bug in GitHub. I switched the upstream branch to "main" and then back to "dev", and now the correct git diff is shown.

@Auzel Auzel changed the base branch from dev to main December 2, 2021 22:29
@Auzel Auzel changed the base branch from main to dev December 2, 2021 22:29
@AngledLuffa
Copy link
Contributor

#1228

@AngledLuffa
Copy link
Contributor

AngledLuffa commented Dec 3, 2021 via email

@AngledLuffa
Copy link
Contributor

As for the test that changed, it was simply a reflection of the ordering of some outputs being changed. Not a big deal

@Auzel
Copy link
Contributor Author

Auzel commented Dec 7, 2021

Okay, noted. I took a look at the changes you've made and realized it hasn't been merged as yet. I'll rerun the tests after the merge to ensure all the (implementation-dependent) flaky tests have been eliminated.

@AngledLuffa
Copy link
Contributor

AngledLuffa commented Dec 7, 2021 via email

@Auzel
Copy link
Contributor Author

Auzel commented Dec 8, 2021

Okay, nice! I have since check the dev branch and all flaky tests have been eliminated. So, when the dev branch is merged into main, everything will be good. So I'll close the PR now.

@Auzel Auzel closed this Dec 8, 2021
@AngledLuffa
Copy link
Contributor

AngledLuffa commented Dec 8, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants