-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Fix testGetSnapshotsMultipleRepos #43993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix testGetSnapshotsMultipleRepos #43993
Conversation
The order doesn't matter here and this just failed on me randomly with the order being off -> fixed by using unordered assertion
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Jenkins run elasticsearch-ci/2 (unrelated watcher test failure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is a 100% reliable fix. It's possible that two snapshots could occur in the same millisecond (and/or the clock isn't granular enough). I think we should do something like this in the test at the point that we require time to advance:
for (final ThreadPool threadPool : internalCluster().getInstances(ThreadPool.class)) {
final long startMillis = threadPool.absoluteTimeInMillis();
assertBusy(() -> assertThat(threadPool.absoluteTimeInMillis(), greaterThan(startMillis)));
}
@DaveCTurner I'm not sure this is a real world possibility with us fsyncing a ton files for each snapshot, but sure I can add that real quick. |
@DaveCTurner added that clock wait :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's hard to judge how likely this is, but if nothing else there's a benefit to documenting the fact that we're relying on time moving forwards. Also we've been bitten in the past with assumptions that certain operations always take a measurable time, particularly across platforms (looking at you, Windows 👀).
Also "fsyncing" isn't really fsyncing in tests IIRC.
Jenkins test elasticsearch-ci/1 (reported unrelated failure in #44012 ) |
The order tested in the equals check in the un-muted test needs exact timestamps. Otherwise two subsequent snapshots might have the same timestamps and hence will get ordered differently than the list of snapshot names that was built as the snapshots were created => Fixed by having exact timestamps in this test.
This was triggered by #43148 indirectly since we now use cached time when creating
SnapshotInfo
so if the test runs very quickly the ordering ofSnapshotInfo
by time is gone because two subsequent instances have the same timestamp due to the now higher time uncertainty form the caching.