-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Conditional Numerical Reproducibility for tests #4569
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #4569 +/- ##
=========================================
Coverage ? 75.84%
=========================================
Files ? 947
Lines ? 172190
Branches ? 18579
=========================================
Hits ? 130593
Misses ? 36422
Partials ? 5175
|
...iclassLogisticRegression/MulticlassLogisticRegression-TrainTest-iris-tree-featurized-out.txt
Outdated
Show resolved
Hide resolved
320e765
to
8b5d1e2
Compare
I'm going to submit some parts of this as separate pull requests, and then rebase to focus it and fix conflicts. |
@@ -283,6 +283,11 @@ public void BinaryClassifierLogisticRegressionTest() | |||
[Trait("Category", "SkipInCI")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ I guess I am missing something... but shouldn't this line be removed in order to check that the updated baselines are actually correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These baselines haven't changed in the master branch since I first implemented this pull request. I'm assuming the new values (which were correct then) are still correct now. They would be validated as part of #4722.
Based on the results of #4722 as a validation, the current plan is:
|
For the record, the validation @sharwell has just mentioned, consisted in running 100 times the It passed on every Windows x64 build, except for one, where it failed only 1 out of the 100 iterations. It failed on every other build configuration, but it is thought to be because CNR works differently on each OS and configuration (MacOS, CentOS, Ubuntu and Windows x86), and then it would be necessary to create a new baseline for each one of these. Notice that that test was disabled recently because it was very flaky on most of our CI builds, and that it was previously disabled on MacOS and x86 since #624 and #1008 respectively. So when doing the baselines for the other build configurations, it would be ok to keep this in mind, and see if that now that CNR is enabled, these pending test scenarios are now resolved. For reference, the same experiment (running it 100 times on all the build configurations) had 3 Windows x64 builds that failed, each one with 100 failures (one per iteration), when ran without the enabled CNR. |
After briefly discussing it with @harishsk , I will run all the tests (including all the disabled ones) with the changes in this PR, and running the test 100 times along with them, just to double check. 😄 For convenience, I am doing it on #4722 It again passed in all Windows x64 builds, except for one, where it failed only 1 out of the 100 iterations. As expected, it didn't pass in the other builds. |
@harishsk has expressed that he doesn't know if it's ok to have different baselines for each build configuration, and would like @eerhardt , @justinormont , and @tannergooding opinion on this regard, before accepting enabling the CNR. |
So I have just ran this again on #4722 , and actually now even the Windows x64 are failing, it fails in all 100 iterations of the @sharwell can you please remove the |
@antoniovs1029 the baselines were altered in #4710 |
You're right. On that PR it's added that the BinaryClassifierSymSGD should also print its bias and weights into the output. So it will only be necessary to add a couple of lines to the baseline. After running it on the CI, I believe the updated baseline (for Windows x64) should be On SymSGD-CV-breast-cancer-out.txt
On SymSGD-TrainTest-breast-cancer-out.txt
By the way, I've gotten baselines for the other platforms and made a couple of experiments with them. So far it seems that enabling CNR does yield more consistent results from the test, although different baselines would indeed be necessary for each platform. I still need to do further experiments on this, though. |
@antoniovs1029 baselines have been updated |
No description provided.