Skip to content

Conversation

mdumandag
Copy link
Contributor

This was the last task necessary to ensure the correctness of the client-side implementation under the split brain.

The client might be connected to both halves of the split brain without knowing it. Eventually, a member list update will come and it will close connections to one half. But, until then, the client might replicate the schema in one half, and put the data to other, breaking our promise that no data will be received by members before the schema.

To solve this, we have decided to use UUIDs of the members that the schema is replicated on the client side. If there is at least one member that is available in the member list on the client-side, but not part of the schema replication response, we retry a couple of times, after waiting for a while.

This PR also updates some names of the fields used in protocol, to comply with their definitions in the protocol repo.

protocol PR: hazelcast/hazelcast-client-protocol#458
closes #580

This was the last task necessary to ensure the correctness of the
client-side implementation under the split brain.

The client might be connected to both halves of the split brain
without knowing it. Eventually, a member list update will come and
it will close connections to one half. But, until then, the client
might replicate the schema in one half, and put the data to other,
breaking our promise that no data will be received by members before
the schema.

To solve this, we have decided to use UUIDs of the members that the
schema is replicated on the client side. If there is at least one
member that is available in the member list on the client-side, but
not part of the schema replication response, we retry a couple of
times, after waiting for a while.

This PR also updates some names of the fields used in protocol,
to comply with their definitions in the protocol repo.
@mdumandag mdumandag added this to the 5.2.0 milestone Mar 13, 2023
@mdumandag mdumandag requested a review from yuce March 13, 2023 15:03
@mdumandag mdumandag self-assigned this Mar 13, 2023
@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2023

Codecov Report

Merging #615 (f7ec9a5) into master (b2196d4) will increase coverage by 0.00%.
The diff coverage is 91.46%.

@@           Coverage Diff           @@
##           master     #615   +/-   ##
=======================================
  Coverage   96.47%   96.48%           
=======================================
  Files         357      357           
  Lines       20515    20573   +58     
=======================================
+ Hits        19791    19849   +58     
  Misses        724      724           
Impacted Files Coverage Δ
hazelcast/compact.py 90.00% <82.85%> (-7.68%) ⬇️
hazelcast/serialization/compact.py 99.02% <93.33%> (+<0.01%) ⬆️
hazelcast/client.py 99.57% <100.00%> (ø)
hazelcast/protocol/builtin.py 94.55% <100.00%> (+0.19%) ⬆️
...zelcast/protocol/codec/client_send_schema_codec.py 100.00% <100.00%> (ø)
...st/protocol/codec/custom/field_descriptor_codec.py 96.15% <100.00%> (ø)
hazelcast/protocol/codec/custom/schema_codec.py 100.00% <100.00%> (ø)

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mdumandag
Copy link
Contributor Author

regarding the drop in code coverage, see nedbat/coveragepy#831. they somehow calculate the imports added for type checking as part of the coverage report. I will exclude this block as they suggested

Copy link
Contributor

@yuce yuce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some ideas...



class CompactSchemaService:
_SEND_SCHEMA_RETRY_COUNT = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This is called MAX_PUT_RETRY_COUNT on the Java side.

What's the reason of the 100 value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 is an arbitrary value that we believed would be a nice default. It is an internal implementation detail (the property you see there is not public) but we wanted to give a way to configure it if someone really wants (through private APIs like this). So, I believe we don't have to use the same name. We use put there because the method to send schemas is called put there, but not here.

Copy link
Contributor

@yuce yuce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@mdumandag mdumandag merged commit 7923483 into hazelcast:master Mar 23, 2023
@mdumandag mdumandag deleted the compact-schema-put-split-brain branch March 23, 2023 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TRACKING ISSUE] [API-1531] Check the member UUIDs that the schema is replicated in the client schema service
3 participants