Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching simulation with more read partitions than write partitions #6229

Merged

Conversation

taylanisikdemir
Copy link
Contributor

What changed?
Adding a new matching simulation scenario to measure the impact of having more read partitions than write partitions on a tasklists.

  • default case with 2 read and 2 write partitions
    vs
  • new case with 4 read, 2 write partitions

Why?
This comparison is interesting because tasklist partition increments are done in phases where first read partitions are increased and then write partitions are increased to the same. During that period, the poll requests routed to the partitions without any write activity would not find any tasks and must be forwarded to parent partition in order to find a task.

How did you test it?
Ran ./scripts/run_matching_simulator.sh more_read_partitions and ./scripts/run_matching_simulator.sh default.

Highlights from the results:

Measurement partitions r=2, w=2 partitions r=4, w=2
Tasks generated 1500 1500
Tasks polled 1500 1500
Avg Poll latency (ms) 240 246
P99 Task latency (ms) 23 2325
Max Task latency (ms) 133 2977
Simulation Duration (s) 36.5 38.5
Tasks Written to DB 22 481
Task forward attempts 158 1219
Sync matches 1480 1115
Async matches 20 385

Conclusion: Having more read partitions than read partitions clearly impacts the p99 task latencies and sync match rate in a negative way.

Copy link

codecov bot commented Aug 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.93%. Comparing base (b1c923e) to head (bbbc211).
Report is 2 commits behind head on master.

Additional details and impacted files

see 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1c923e...bbbc211. Read the comment docs.

Copy link
Contributor

@abhishekj720 abhishekj720 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not we always have a check as well to always have less or equal write partitions? I am not sure if we have it now.

@taylanisikdemir
Copy link
Contributor Author

Why not we always have a check as well to always have less or equal write partitions? I am not sure if we have it now.

That would require a validation of cross property value checks for dynamic configs. It's not the main problem/missing thing though. The problem is that the safe way to scale up a tasklist requires diverging read/write partition counts for a while and that during period task match latencies are elevated

@taylanisikdemir taylanisikdemir merged commit c903543 into uber:master Aug 15, 2024
19 checks passed
@taylanisikdemir taylanisikdemir deleted the taylan/more_read_partitions_sim branch August 15, 2024 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants