-
Notifications
You must be signed in to change notification settings - Fork 14
REP-6088 Tolerate high numbers of mismatches #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REP-6088 Tolerate high numbers of mismatches #117
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general. I've left some comments on small things.
return errors.Wrapf(err, "starting session") | ||
} | ||
|
||
sctx := mongo.NewSessionContext(ctx, sess) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we reading in a session just to get the cluster time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That, per the driver team, is the approved way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment to explain the purpose of a session here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Previously all document mismatches were recorded directly in the verification task. This meant, though, that if a task encompassed a large number of mismatched or missing documents, the verifier could fail to persist all of the mismatches, which caused a crash.
(The usual cause of excess mismatched/missing documents is starting migration-verifier before initial sync finishes, but it can also reasonably happen without REP-6129’s fix for queries against pre-v5 servers. See HELP-75910.)
This changeset makes the verifier save mismatches to a dedicated collection instead, one document per mismatch.
This change upends some familiar workflows for investigating mismatches: it’s no longer sufficient just to query the
verification_tasks
collection for mismatch information since the actual mismatches are recorded in a separate collection. To address this, the documentation now gives an aggregation pipeline that yields a similarly-useful result.This entails a metadata version change. Because that’s happening, this also changes the task type
verify
toverifyDocuments
. (That required some sorting workarounds in tests, which were tight-coupled to the task type strings.)