Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log replication messages that did not fit #4844

Conversation

vytautas-karpavicius
Copy link
Contributor

What changed?
During fanout to multiple peers and collection of replication messages we collect responses up until the RPC message size limit. Remaining responses are discarded. They are expected to be picked up with next call.

However, there is not visibility whether (or how often) this is happening.

  • Adding additional logging to emit which shards were discarded as well as info about sizes (max size, total size accumulated so far and current size that is out of limits).
  • Also, instead of terminating early and returning partial response. Continue the loop and try other peer responses. They may be smaller and could still fit into the response.

Why?
For better visibility.

How did you test it?
Deployed in our staging environment, reduced max rpc limit via dynamic config system.grpcMaxSizeInByte and checked that logs are emitted.

Potential risks

Release notes

Documentation Changes

@coveralls
Copy link

Pull Request Test Coverage Report for Build 0180f629-86eb-40a5-ac1b-5c421986e75c

  • 0 of 25 (0.0%) changed or added relevant lines in 2 files are covered.
  • 8 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.03%) to 56.862%

Changes Missing Coverage Covered Lines Changed/Added Lines %
common/log/tag/tags.go 0 9 0.0%
client/history/client.go 0 16 0.0%
Files with Coverage Reduction New Missed Lines %
service/history/queue/timer_queue_processor_base.go 1 77.92%
common/task/fifoTaskScheduler.go 2 84.54%
common/task/weightedRoundRobinTaskScheduler.go 2 89.64%
common/cache/lru.go 3 90.73%
Totals Coverage Status
Change from base Build 0180f4d6-2414-4ac3-bc47-e3b3193e3c50: 0.03%
Covered Lines: 83774
Relevant Lines: 147328

💛 - Coveralls

Copy link
Contributor

@davidporter-id-au davidporter-id-au left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be awesome, it's been a problem for debugging prod before.

@vytautas-karpavicius vytautas-karpavicius merged commit 471e6d1 into uber:master May 25, 2022
@vytautas-karpavicius vytautas-karpavicius deleted the log-unfit-replication-messages branch May 25, 2022 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants