-
Notifications
You must be signed in to change notification settings - Fork 74
Make sure the Compact schema is replicated on all known cluster members [API-1564] #615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure the Compact schema is replicated on all known cluster members [API-1564] #615
Conversation
This was the last task necessary to ensure the correctness of the client-side implementation under the split brain. The client might be connected to both halves of the split brain without knowing it. Eventually, a member list update will come and it will close connections to one half. But, until then, the client might replicate the schema in one half, and put the data to other, breaking our promise that no data will be received by members before the schema. To solve this, we have decided to use UUIDs of the members that the schema is replicated on the client side. If there is at least one member that is available in the member list on the client-side, but not part of the schema replication response, we retry a couple of times, after waiting for a while. This PR also updates some names of the fields used in protocol, to comply with their definitions in the protocol repo.
Codecov Report
@@ Coverage Diff @@
## master #615 +/- ##
=======================================
Coverage 96.47% 96.48%
=======================================
Files 357 357
Lines 20515 20573 +58
=======================================
+ Hits 19791 19849 +58
Misses 724 724
... and 4 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
regarding the drop in code coverage, see nedbat/coveragepy#831. they somehow calculate the imports added for type checking as part of the coverage report. I will exclude this block as they suggested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some ideas...
|
||
|
||
class CompactSchemaService: | ||
_SEND_SCHEMA_RETRY_COUNT = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: This is called MAX_PUT_RETRY_COUNT
on the Java side.
What's the reason of the 100 value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100 is an arbitrary value that we believed would be a nice default. It is an internal implementation detail (the property you see there is not public) but we wanted to give a way to configure it if someone really wants (through private APIs like this). So, I believe we don't have to use the same name. We use put there because the method to send schemas is called put there, but not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
This was the last task necessary to ensure the correctness of the client-side implementation under the split brain.
The client might be connected to both halves of the split brain without knowing it. Eventually, a member list update will come and it will close connections to one half. But, until then, the client might replicate the schema in one half, and put the data to other, breaking our promise that no data will be received by members before the schema.
To solve this, we have decided to use UUIDs of the members that the schema is replicated on the client side. If there is at least one member that is available in the member list on the client-side, but not part of the schema replication response, we retry a couple of times, after waiting for a while.
This PR also updates some names of the fields used in protocol, to comply with their definitions in the protocol repo.
protocol PR: hazelcast/hazelcast-client-protocol#458
closes #580