Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Operations Performance improvements #18578

Merged
merged 1 commit into from
Jan 15, 2021

Conversation

xinlian12
Copy link
Member

@xinlian12 xinlian12 commented Jan 12, 2021

Problem:
Currently the write latency and write throughput is not as good as gateway. When we increase concurrency, the throughput does not increase as much as it should be. The issue is more obvious for small collections (with only 1 partition).

Fix:
This change will help to improve the latency and throughput by selecting the next channel in a round-robin fashion which helps load balance the workload when multiplexing.

Tests1: 100,000 RU, 16 CPU core, Linux(ubuntu), West U2, Eventual consistency, WriteLatency
image

image

image

image

Test2: 100,000 RU, 4 CPU core, Linux(ubuntu), West U2, Eventual consistency, WriteLatency
image

image

image

image

Tests3: 6000 RU, 16 CPU core, Linux(ubuntu), West U2, Eventual consistency, WriteThroughput (the concurrency is only up to 16 since we start to get throttle on concurrency 4 (direct after fix) and 8 (gateway)
image

Tests4: 100,000 RU, 16 CPU core, Linux(ubuntu), West U2, Eventual consistency, ReadLatency This test is to make sure there is no regression for read.
image

image

image

image

Test5: YCSB customized write workload. 4 CPU core, 100000RU, Linux(Ububtu), West US2. 1-2 threads -> total 100000 operation counts, 4-64 threads -> total 1000000 operation counts
image

image

image

@kushagraThapar
Copy link
Member

@xinlian12 - can you please provide some explanation on the code change from pollLast to pollFirst ?

Copy link
Contributor

@moderakh moderakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work @xinlian12
Please attach the perf graphs as well.

LGTM.

@xinlian12
Copy link
Member Author

xinlian12 commented Jan 13, 2021

@xinlian12 - can you please provide some explanation on the code change from pollLast to pollFirst ?

So before the change, when we release a channel, we are going to add to the tail of this queue, and when we acquire a channel, we still try to get from the tail of the queue, so it could lead to some channels are handling more loads than others (or could cause we are not fully utilize the channels).

After the change, it will behave more like round-robin, so hopefully the load can be more balanced.

Copy link
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kushagraThapar
Copy link
Member

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@moderakh moderakh merged commit 9f51290 into Azure:master Jan 15, 2021
@xinlian12 xinlian12 deleted the writePerfFix branch February 3, 2021 22:13
@kushagraThapar kushagraThapar changed the title writePerfFix Write Operations Performance improvements Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants