Fix many flaky tests due to non-determinism of hash map #1218

Auzel · 2021-11-10T08:09:24Z

Following up on our earlier conversation, I have found the following flaky tests in this project:

edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testNegatedAttribute edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testComplicatedGraph edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testNotEquals edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testTwoGraphs edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testSimpleRequest edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testSingleMultiRequest edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testTwoSemgrex edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructureTest.doTest edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testEqualsRelation edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testDoubleMultiRequest edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testAttributeConjunction edu.stanford.nlp.naturalli.RelationTripleSegmenterTest.testAmericanActorChrisPratt edu.stanford.nlp.semgraph.semgrex.ProcessSemgrexRequestTest.testUnclosedMultiRequest

All but edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testNegatedAttribute and edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testAttributeConjunction assume that the iterator of the HashMap returns the values in the order they were stored. However, this is not true in java since values are returned in a non-deterministic order.

My fix is to let the underlying data structure be a LinkedHashMap ,as opposed to a HashMap. This will ensure that the values are returned in the expected order. The performance differences of the two data structures are negligible but the change to LinkedHashMap will eliminate 10 flaky tests.

AngledLuffa · 2021-11-10T08:55:32Z

I understand that conceptually there's no guarantee a java map iterates items in the same order they were inserted, but I have never seen this cause errors in the tests in practice. That's on multiple architectures and JVMs. I'd be very curious to know what setup you are using which is causing this to happen.

At any rate, without extensively testing the difference in memory & speed performance, I'm not up for switching HashMap -> LinkedHashMap. I'd be happy to integrate changes to the tests themselves which eliminate the flakiness, though.

Auzel · 2021-11-27T02:09:28Z

Yes you are correct that you nor I may be able to reproduce this error on demand. However, since it can occur at any point in time (running on a particular JVM), I try to fix these flaky tests before it comes a bigger issue. I use a tool call NONDEX to detect such tests; but I have not yet tried running the code on different JVMs to reproduce the error.

With regards to the 10 tests which assumes the iterator returns the elements in an order, I don't think it's possible to change the tests alone to make them non-flaky. More specifically, although I can, in principle, simply check that the list contains a particular string (rather than assuming the string exists at a particular index), it will not be a sufficient test of the underlying methods. The underlying methods (or code) relies on a specific order (which isn't guaranteed with a HashMap); hence my suggestion to use a LinkedHashMap.

The other two test methods (testNegatedAttribute, and testAttributeConjunction) I should be able to fix with changes solely in the test code.

Auzel · 2021-11-27T18:42:57Z

Upon further investigation, both testNegatedAttribute and testAttributeConjunction relies on comparePatternToString(..) in SemgrexTest test class. This method compiles the pattern and then checks if the original pattern matches the output pattern. The issue is that NodeAttributes is backed by a HashMap so the output pattern of the string (after executing SemgrexPattern.compile(pattern)) can have a different pattern, as shown using the Nondex tool. More specifically this output pattern is similar to the old pattern but with some substrings in a different order. Since these tests are supposed to check that the string pattern are exactly identical (not just contain same substrings), I cannot fix these flaky tests with just changes to the test suite. It'll require a similar change to my comment earlier but in this case instantiating a LinkedHashMap for the attributes field of a NodeAttributes object.

I understand that you're not prepared to make that adjustment but I've made these suggestions since this issue can show its 'ugly head' at any point in time or upon a change in JVM — which has happened many times on other projects.

AngledLuffa · 2021-11-29T21:17:59Z

I'm totally fine with making a smaller slice of the codebase use a LinkedHashMap. I just don't want Generics.newHashMap() to change without understanding the memory and speed implications of that. Semgrex expressions themselves are far more expensive than the map itself, so there's no reason not to make that change in that module. (Compare this with the conparser's use of maps for features, for example, where the map is on the critical path and is the main use of memory.)

Also, I think I understand now why this issue has never come up before. It appears that having two attributes at once on a node is not part of the unit test. Oops!

Thanks for your persistence - the extra bit of information here really helped narrow down the issue.

#1224

Auzel · 2021-11-29T23:09:07Z

Oh, I see and I fully understand your concern. Yes, I should have made the change to the submodule instead, as to prevent any unintended consequences. Thanks for pointing that out. Glad everything is solved now.

AngledLuffa · 2021-11-29T23:49:38Z

Great. I merged my version of the PR. It also had an expanded unit test, which did in fact show the error you're talking about here. (Now fixed, obviously!) I merged it to the dev branch. Would you like to check if there are other remaining non-deterministic tests?

This reverts commit adbcad8.

Auzel · 2021-12-02T17:16:35Z

Okay, so I took a closer look. Your fix solved the last two problems:
edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testNegatedAttribute edu.stanford.nlp.semgraph.semgrex.SemgrexTest.testAttributeConjunction

My new fix fixes all but edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructureTest.doTest. There are two points to note:

SemgrexPattern.java has multiple matcher() methods. However, I only made a change to one of them because that was the only one exposed by the unit tests already written. It may be necessary to apply that change to the other methods to future proof it against future flaky tests (implementation-dependent).
As presently constructed, this PR should not be merged. Initially, the edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructureTest.doTest flaky test was only exposed when I ran the NONDEX tool. After applying the new fixes, this flaky test becomes non-flaky but always fails. In both cases (NONDEX and mvn test), the output of the failed test was doTest[238](edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructureTest): Unexpected CC processed dependencies for tree (ROOT (S (NP (PRP$ His) (NN term)) (VP (VP (VBZ has) (VP (VBN produced) (NP (NP (DT no) (JJ spectacular) (NNS failures)) (PP (PP (IN in) (NP (NNS politics))) (, ,) (PP (IN in) (NP (DT the) (NN economy))) (CC or) (PP (IN on) (NP (DT the) (JJ military) (NN front))))))) (, ,) (CC and) (VP (VBZ has) (VP (VBN chalked) (PRT (RP up)) (NP (DT some) (NNS successes))))) (. .))) expected:<...(..) doTest[242](edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructureTest): Unexpected CC processed dependencies for tree (ROOT (S (NP (NNP Jill)) (VP (VBD walked) (PP (PP (IN out) (NP (DT the) (NN door))) (, ,) (PP (IN over) (NP (DT the) (NN road))) (, ,) (PP (IN across) (NP (DT the) (JJ deserted) (NN block))) (, ,) (PP (IN around) (NP (DT the) (NN corner))) (, ,) (CC and) (PP (IN through) (NP (DT the) (NN park))))) (. .))) expected:<...)(..) . I include this hoping you can provide some insight on this error and what may be causing it (because it isn't entirely clear to me). Once that's done, I can make the necessary changes. And so, all the implementation-dependent flaky tests will be fixed, and the PR will be ready for merging.

AngledLuffa · 2021-12-02T22:04:35Z

README.md

@@ -40,7 +40,7 @@ At present, [the current released version of the code](https://stanfordnlp.githu
 1. Make sure you have Maven installed, details here: [https://maven.apache.org/](https://maven.apache.org/)
 2. If you run this command in the CoreNLP directory: `mvn package` , it should run the tests and build this jar file: `CoreNLP/target/stanford-corenlp-4.3.2.jar`
 3. When using the latest version of the code make sure to download the latest versions of the [corenlp-models](http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar), [english-extra-models](http://nlp.stanford.edu/software/stanford-english-extra-corenlp-models-current.jar), and [english-kbp-models](http://nlp.stanford.edu/software/stanford-english-kbp-corenlp-models-current.jar) and include them in your CLASSPATH.  If you are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in.  
-4. If you want to use Stanford CoreNLP as part of a Maven project you need to install the models jars into your Maven repository.  Below is a sample command for installing the Spanish models jar.  For other languages just change the language name in the command.  To install `stanford-corenlp-models-current.jar` you will need to set `-Dclassifier=models`.  Here is the sample command for Spanish: `mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=4.2.2 -Dclassifier=models-spanish -Dpackaging=jar`
+4. If you want to use Stanford CoreNLP as part of a Maven project you need to install the models jars into your Maven repository.  Below is a sample command for installing the Spanish models jar.  For other languages just change the language name in the command.  To install `stanford-corenlp-models-current.jar` you will need to set `-Dclassifier=models`.  Here is the sample command for Spanish: `mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=4.3.2 -Dclassifier=models-spanish -Dpackaging=jar`


@J38 has already done this in both main & dev

I confirm this was a bug in GitHub. I switched the upstream branch to "main" and then back to "dev", and now the correct git diff is shown.

AngledLuffa · 2021-12-03T02:01:35Z

#1228

AngledLuffa · 2021-12-03T02:01:40Z

In thinking about the semgrex and SemanticGraph maps, I think the SemanticGraph map is far more relevant to the actual use of the tools. The SemanticGraph maps are used for iterating to follow the edges from a node, for example, and the order there can change the order results are returned in. I think both the "inner" and "outer" map would benefit from turning into LinkedHashMap, as then nodes and edges would both be iterated in an expected order. namesToNodes and namesToRelations is iterated in outputResults in the SemgrexTest, but only for visualizing the results. Is that what the test checking tool is finding? Regardless, I don't see much harm in changing the types used, and it would make the results more visually consistent at least.

…

On Thu, Dec 2, 2021 at 2:32 PM Auzel ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In README.md <#1218 (comment)>: > @@ -40,7 +40,7 @@ At present, [the current released version of the code](https://stanfordnlp.githu 1. Make sure you have Maven installed, details here: [https://maven.apache.org/](https://maven.apache.org/) 2. If you run this command in the CoreNLP directory: `mvn package` , it should run the tests and build this jar file: `CoreNLP/target/stanford-corenlp-4.3.2.jar` 3. When using the latest version of the code make sure to download the latest versions of the [corenlp-models](http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar), [english-extra-models](http://nlp.stanford.edu/software/stanford-english-extra-corenlp-models-current.jar), and [english-kbp-models](http://nlp.stanford.edu/software/stanford-english-kbp-corenlp-models-current.jar) and include them in your CLASSPATH. If you are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in. -4. If you want to use Stanford CoreNLP as part of a Maven project you need to install the models jars into your Maven repository. Below is a sample command for installing the Spanish models jar. For other languages just change the language name in the command. To install `stanford-corenlp-models-current.jar` you will need to set `-Dclassifier=models`. Here is the sample command for Spanish: `mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=4.2.2 -Dclassifier=models-spanish -Dpackaging=jar` +4. If you want to use Stanford CoreNLP as part of a Maven project you need to install the models jars into your Maven repository. Below is a sample command for installing the Spanish models jar. For other languages just change the language name in the command. To install `stanford-corenlp-models-current.jar` you will need to set `-Dclassifier=models`. Here is the sample command for Spanish: `mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=4.3.2 -Dclassifier=models-spanish -Dpackaging=jar` I confirm this was a bug in GitHub. I switched the upstream branch to "main" and then back to "dev", and now the correct git diff is shown. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1218 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWLVWYHOVJ7W5UEBYEDUO7XZJANCNFSM5HXHJFTQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

AngledLuffa · 2021-12-03T02:18:20Z

As for the test that changed, it was simply a reflection of the ordering of some outputs being changed. Not a big deal

Auzel · 2021-12-07T04:08:38Z

Okay, noted. I took a look at the changes you've made and realized it hasn't been merged as yet. I'll rerun the tests after the merge to ensure all the (implementation-dependent) flaky tests have been eliminated.

AngledLuffa · 2021-12-07T17:40:36Z

Sounds good. It is merged now. I left you as the author for the change to LinkedHashMap. Thanks for the help!

Auzel · 2021-12-08T17:34:06Z

Okay, nice! I have since check the dev branch and all flaky tests have been eliminated. So, when the dev branch is merged into main, everything will be good. So I'll close the PR now.

AngledLuffa · 2021-12-08T17:55:01Z

Thanks again!

…

On Wed, Dec 8, 2021, 9:34 AM Auzel ***@***.***> wrote: Okay, nice! I have since check the dev branch and all flaky tests have been eliminated. So, when the dev branch is merged into main, everything will be good. So I'll close the PR now. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1218 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWMAYHI2NBA7NGIB5KTUP6JJTANCNFSM5HXHJFTQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Auzel and others added 3 commits October 26, 2021 00:23

Fix flaky test in testConnectedComponents

c96a736

Merge branch 'stanfordnlp:main' into hash

69bb2da

fix flaky tests due to hashmap

adbcad8

AngledLuffa mentioned this pull request Nov 29, 2021

Replace HashMap with a LinkedHashMap for semgrex. Makes the order of… #1224

Merged

Auzel and others added 3 commits December 2, 2021 09:55

Revert "fix flaky tests due to hashmap"

f728b17

This reverts commit adbcad8.

Merge branch 'stanfordnlp:main' into hash

36f3752

fix some flaky tests

9a29af7

AngledLuffa reviewed Dec 2, 2021

View reviewed changes

Auzel changed the base branch from dev to main December 2, 2021 22:29

Auzel changed the base branch from main to dev December 2, 2021 22:29

Auzel closed this Dec 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix many flaky tests due to non-determinism of hash map #1218

Fix many flaky tests due to non-determinism of hash map #1218

Auzel commented Nov 10, 2021

AngledLuffa commented Nov 10, 2021

Auzel commented Nov 27, 2021

Auzel commented Nov 27, 2021 •

edited

Loading

AngledLuffa commented Nov 29, 2021

Auzel commented Nov 29, 2021

AngledLuffa commented Nov 29, 2021 via email

Auzel commented Dec 2, 2021

AngledLuffa Dec 2, 2021

Auzel Dec 2, 2021

AngledLuffa commented Dec 3, 2021

AngledLuffa commented Dec 3, 2021 via email

AngledLuffa commented Dec 3, 2021

Auzel commented Dec 7, 2021

AngledLuffa commented Dec 7, 2021 via email

Auzel commented Dec 8, 2021

AngledLuffa commented Dec 8, 2021 via email

Fix many flaky tests due to non-determinism of hash map #1218

Fix many flaky tests due to non-determinism of hash map #1218

Conversation

Auzel commented Nov 10, 2021

AngledLuffa commented Nov 10, 2021

Auzel commented Nov 27, 2021

Auzel commented Nov 27, 2021 • edited Loading

AngledLuffa commented Nov 29, 2021

Auzel commented Nov 29, 2021

AngledLuffa commented Nov 29, 2021 via email

Auzel commented Dec 2, 2021

AngledLuffa Dec 2, 2021

Choose a reason for hiding this comment

Auzel Dec 2, 2021

Choose a reason for hiding this comment

AngledLuffa commented Dec 3, 2021

AngledLuffa commented Dec 3, 2021 via email

AngledLuffa commented Dec 3, 2021

Auzel commented Dec 7, 2021

AngledLuffa commented Dec 7, 2021 via email

Auzel commented Dec 8, 2021

AngledLuffa commented Dec 8, 2021 via email

Auzel commented Nov 27, 2021 •

edited

Loading