-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mvcc: push down RangeOptions.limit argv into index tree to reduce memory overhead #11990
Conversation
6f53077
to
e939961
Compare
Codecov Report
@@ Coverage Diff @@
## master #11990 +/- ##
==========================================
- Coverage 65.96% 65.53% -0.43%
==========================================
Files 403 403
Lines 37306 37310 +4
==========================================
- Hits 24608 24452 -156
- Misses 11182 11329 +147
- Partials 1516 1529 +13
Continue to review full report at Codecov.
|
e939961
to
1677fd8
Compare
before optimization:
after optimization:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, nice!
limit ==0 means fetch everything, but this pr merged seems will change the behavior |
can you give me a failed case? if limit == 0, it seems will also fetch everything. thanks. |
sry i miss some logic here.. |
@tangcong This PR broke Kubernetes api chunking feature. Kubernetes uses Range request Count returned by I don't this behavior is correct as it removes value provided by having a Count field as it will always equal number of keys returned. Count is only beneficial if it returns number of objects with Limit. |
Thank you for nailing this @serathius. Definitely its a wire-format breaking change and in scope of 3.x branch we cannot perform it. I think we shouldn't push down the limit if its count-only query (and I hope that count-only queries are not accumulating the memory). I see value of counting with limit - e.g.(for some use-cases like checking whether there is more than N objects, Potentially etcd/clientv3 could print warning if both: count & limit are set if we imagine we could change the behavior in v4. |
There are two main scenarios:
unfortunately, it broke Kubernetes api chunking feature. i agree with @ptabor, etcd 3.5 coming soon,it is not critical, i hope we can optimize in a future version. thanks @serathius @ptabor @gyuho |
Without this change, if we list all the pods with a limit specified, it will be a nightmare when we are going through millions of pod objects. From what I observed in issue #13068, the write performance can get reduced to only 25% of the normal case. |
I saw the original issue: #13056 (comment) It seems to be very tricky... |
cc @Jeffwan |
Yep, K8s directly depends on this behavior to provide chunking in API. |
It seems that this PR only affects the RemainingItemCount feature,it has always been an Alpha feature. |
@tangcong what was the final outcome about the effect on the RemainingItemCount feature in Kubernetes? I'm asking because this page shows that it is currently in Beta (not Alpha as you said above) and that it is enabled by default. https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/ |
if there are too many keys(key,rangeEnd), even if we set RangeOptions.limit to 1,range request is very expensive and it will result in oom. this pr will push down RangeOptions.limit argv into index tree to reduce memory overhead. @gyuho