-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upstream: implement UpstreamLocalityStats.loadMetricStats #18534
Conversation
UpstreamLocalityStats.loadMetricStats in load reports Signed-off-by: Eugene Chan <eugenechan@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing some context as I was not aware of the load-stats-reporter infrastructure.
Where do these stats get consumed?
Signed-off-by: Eugene Chan <eugenechan@google.com>
These stats are sent as part of LoadStatsRequest to the management server. (See https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/load_report.proto#envoy-v3-api-msg-config-endpoint-v3-upstreamlocalitystats) |
/retest |
Retrying Azure Pipelines: |
I think maybe @adisuissa can take this one, since LRS is a pretty important part of control plane. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
Left a few comments with some questions.
Signed-off-by: Eugene Chan <eugenechan@google.com>
Signed-off-by: Eugene Chan <eugenechan@google.com>
Thanks, PTAL /retest |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes LGTM modulo the design comment.
@pianiststickman can you clarify comments please so we can submit it? |
/wait |
Signed-off-by: Eugene Chan <eugenechan@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this PR has no effect on Envoy other than to make the HostDescription object 8 bytes bigger.
Can you put a pointer to a doc explaining how this can be used, and its scaling limitations, into C++ comments and the PR description?
Thank you!
/wait |
/wait |
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Eugene Chan <eugenechan@google.com>
PTAL, and apologies for the delay on this. I've added comments explaining the performance considerations of the current implementation. I think since this is purely an implementation detail (putting key names in thread-local storage, eliminating contention on the locks used to hold the stat maps) I'd prefer to take a pragmatic approach and dispense with that when it becomes necessary. Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you merge main, maybe the coverage test will pass. Otherwise you might have to dive into the CI report and see what's wrong. Sometimes coverage fails because we have added lines of code that are not covered with the new tests, sometimes it's just a flake or a consequence of non-hermetic tests, since the coverage tests run all in one process.
* Implementation of LoadMetricStats. | ||
* | ||
* TODO(pianiststickman): this implementation puts a copy of the stat name into every host that | ||
* receives a copy of that metric. This can be improved by putting a single copy of the stat name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note here that it takes a lock on the hot-path.
I really think this should not be deployed as is, but I understand why you want to add the interface and iterate on the perf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/wait |
Signed-off-by: Eugene Chan <eugenechan@google.com>
/retest |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign-from envoyproxy/senior-maintainers
ON_CALL(cm_, clusters()).WillByDefault(Return(cluster_info)); | ||
deliverLoadStatsResponse({"foo"}); | ||
// First stats report on timer tick. | ||
time_system_.setMonotonicTime(std::chrono::microseconds(4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using advanceTimeWait with a delta, rather than just setting the time to an absolute time.
envoyproxy/senior-maintainers assignee is @htuch |
@@ -218,6 +218,27 @@ HostVector filterHosts(const absl::node_hash_set<HostSharedPtr>& hosts, | |||
|
|||
} // namespace | |||
|
|||
// TODO(pianiststickman): this implementation takes a lock on the hot path and puts a copy of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to call out this warning so the senior-maintainer sees it.
We have more detailed notes on how to avoid taking a lock on every request for this use-case, but @pianiststickman wants to merge this before building out the thread-local impl to establish the API even with a potential perf bottleneck and then iterate.
Note that this code can't be reached in production without adding new C++ APIs, so it will not impact existing Envoy deployments.
We need another senior maintainer, since both Josh and PR author are Googlers :) |
/assign-from envoyproxy/senior-maintainers |
envoyproxy/senior-maintainers assignee is @htuch |
/assign-from envoyproxy/senior-maintainers |
envoyproxy/senior-maintainers assignee is @zuercher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me given the proviso that the performance issue addressed before this becomes available for general use.
…#18534) Implements a simple API allowing extensions to populate UpstreamLocalityStats.loadMetricStats in load reports, stored as a flat hash map of keys and values that may vary at runtime. Individual stats are accumulated by calling add(), which combines stats with the same name; the aggregated stats are retrieved by calling latch(), which also clears the current load metrics. The load metrics are latched and read by the load stats reporter. Note: this implementation puts a copy of the stat name into every host that receives a copy of that metric. This can be improved by putting a single copy of the stat name into a thread-local key->index map and using the index as the key to the stat map instead. Risk Level: low Testing: CI Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Eugene Chan <eugenechan@google.com> Signed-off-by: Josh Perry <josh.perry@mx.com>
Implements a simple API allowing extensions to populate UpstreamLocalityStats.loadMetricStats in load reports, stored as a flat hash map of keys and values that may vary at runtime. Individual stats are accumulated by calling add(), which combines stats with the same name; the aggregated stats are retrieved by calling latch(), which also clears the current load metrics. The load metrics are latched and read by the load stats reporter.
Signed-off-by: Eugene Chan eugenechan@google.com
Commit Message: upstream: implement UpstreamLocalityStats.loadMetricStats
Additional Description: Implements a simple API allowing extensions to populate UpstreamLocalityStats.loadMetricStats in load reports, stored as a flat hash map of keys and values that may vary at runtime. Individual stats are accumulated by calling add(), which combines stats with the same name; the aggregated stats are retrieved by calling latch(), which also clears the current load metrics. The load metrics are latched and read by the load stats reporter.
Note: this implementation puts a copy of the stat name into every host that receives a copy of that metric. This can be improved by putting a single copy of the stat name into a thread-local key->index map and using the index as the key to the stat map instead.
Risk Level: low
Testing: CI
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a