-
Notifications
You must be signed in to change notification settings - Fork 390
Mas i994 handoffsync #995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Mas i994 handoffsync #995
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Remove legacy code from handoff_sender. The handoff_receiver must keep unused references, as during an upgrade an updated node could be a receiver to a non-updated node sending. For non-batch not such an issue, as all supported versions for upgrade already supported batch. However, receiver must still indicate it supports batch due to above problem. All handoff receiver/sender code tidied down to 80-column width
Do away with timer based sync, and ack-sync based on threshold only. Also log ongoing transfer progress every ack-log threshold. Log at point of error reason for error - avoid generic {shutdown, timeout} error with no clue as to actual point of code origin.
Fix issue that configure message does not respond sync, and so another sync is now required.
Confusing that the first log on the sending of a batch will indicate a batch_count of 100 not 0
Clarify log text, as batch_size no longer fixed
Need to distinguish between failed fold and slow fold - and so the keepalive of the receiver has value. Now implemented by checking a keepalive_next time every visit item, rather than continuously entering and exiting selective receive
yorkshire-steve
approved these changes
Jan 16, 2023
Make a metadata exchange part of the join process. This prevents the situation where a bucket type is active in a cluster, then a node joins (as part of cluster expansion, say), but the bucket properties are not known to joining node during handoff of objects of that type. Now, the join cannot be staged without a metadata exchange, so that all joining nodes know of cluster metadata (e.g. bucket types) before the join is committed and handoffs start.
It is only an attempt - failure (i.e. timeout) would be no different to the current state with a potential race, so we don't block joins. Joins will normally be safer because of this.
yorkshire-steve
approved these changes
Jan 18, 2023
martinsumner
added a commit
that referenced
this pull request
Feb 1, 2023
* Initial tidy Remove legacy code from handoff_sender. The handoff_receiver must keep unused references, as during an upgrade an updated node could be a receiver to a non-updated node sending. For non-batch not such an issue, as all supported versions for upgrade already supported batch. However, receiver must still indicate it supports batch due to above problem. All handoff receiver/sender code tidied down to 80-column width * Change AckSync to every batch by default Do away with timer based sync, and ack-sync based on threshold only. Also log ongoing transfer progress every ack-log threshold. Log at point of error reason for error - avoid generic {shutdown, timeout} error with no clue as to actual point of code origin. * Standardise send_sync into function Fix issue that configure message does not respond sync, and so another sync is now required. * Receiver needs vnode module not master * Make first log indicate batch_count of 0 Confusing that the first log on the sending of a batch will indicate a batch_count of 100 not 0 * Batch threshold can be either count or size Clarify log text, as batch_size no longer fixed * Reinstate keepalive of receiver Need to distinguish between failed fold and slow fold - and so the keepalive of the receiver has value. Now implemented by checking a keepalive_next time every visit item, rather than continuously entering and exiting selective receive * Further comments * Update after review * Metadata exchange on join Make a metadata exchange part of the join process. This prevents the situation where a bucket type is active in a cluster, then a node joins (as part of cluster expansion, say), but the bucket properties are not known to joining node during handoff of objects of that type. Now, the join cannot be staged without a metadata exchange, so that all joining nodes know of cluster metadata (e.g. bucket types) before the join is committed and handoffs start. * Attempt exchange on Join It is only an attempt - failure (i.e. timeout) would be no different to the current state with a potential race, so we don't block joins. Joins will normally be safer because of this.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Refactor handoff to avoid TCP RECV error - see #994.
Clarify meaning of terms:
handoff_timeout
if the receiver does not respond to a sender message within this period, then the sender should assume the receiver is down and exit the fold. The sender can exchange sync messages everyhandoff_acksync_threshold
batches to ensure that batches do not accumulate at the receiver causing the sender to fail to get a timely response to a future request.handoff_receive_timeout
if the receiver does not receive a message from the sender within this timeout, it will assume that the sender is down and exit the process. The sender will interrupt the fold if it has not sent a batch in the lasthandoff_receive_timeout div 3
to send a sync message as a keepalive to prevent this timeout if it is alive.The default value of
handoff_acksync_threshold
is changed to 1 from 25. At 25 batches, and 250 bytes every object size, a backlog of 100K handoff items could develop at the receiver - requiring a sub-1ms response time for each handoff item to avoid triggering the timeout. Alternative default of 1 is less likely to see the timeout, at the small cost of an additional sync round-trip per batch.A
handoff_batch_threshold_count
has been added to generate a threshold for a batch when this count of objects is reached. Thehandoff_batch_threshold
is also respected (which triggers a batch on size in bytes). This is to prevent very large batches by count when object sizes are small.The ?PT_MSG_OLDSYNC is now properly deprecated. The receiver will still respond, should a cluster contain a handoff sender still on a previous version, but the sender will now only send the ?PT_MSG_SYNC message.
Reformatting of code to fit column width.
Standardise handling of sync messages through
riakc_core_handoff_sender:send_sync/3
function, and now error logs raised at the actual message where an error is received. The item queue is redacted in the error generated by a throw, so that important details are not obfuscated by a large list of binaries.A new log has been added to log progress in sending batches every
handoff_acklog_threshold
batches (default 100). This allows for progress to be tracked through log indexers, rather than manually by operators calling theriak admin transfers
command.All code for (legacy) non-batched handoff removed.