-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stats to WASM VMs. #9259
Add stats to WASM VMs. #9259
Conversation
Signed-off-by: John Plevyak <jplevyak@gmail.com>
hi -- can you give some more context to this PR? I'd like to understand better the relationship between threads & VM instances, and what exactly is the semantics when you clone a VM. Thanks! /wait |
As per our discussions, the stats are going to be initialized infrequently should outlive the restarted VMs so that creation costs can be amortized and so that they do not get reset (to zero for counters). This PR decouples the lifetime of stats from listeners and VM and also adds individual stats for active VMs and created VMs providing insight into VM life cycles. A coming PR will use these mechanisms to add stats for VM startup issues as the current xDS configuration mechanism which is used for all filter/listener configurations (not just WASM) makes a configuration active even if there are issues (e.g. exceptions) and these stats will improve visibility of configuration issues (i.e. they can be be alerted on). |
@mandarjog and @kyessenov have context on this issue |
The particular issue we were debugging was accumulation of Wasm VMs attached to draining listeners. Since draining listeners are removed on a timer basis, we may have quite a number of VMs alive at any point (including their onTick callbacks). A counter on the number of active VMs would help us here. |
very cool. The comments from @jplevyak and @kyessenov would both be appropriate for the PR description. |
Updated PR description. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, mostly around style nits and doc.
There's a bunch going on here that I think @lizan will need to review, or someone else that's read the book of knowledge of wasm :)
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some small comments, thanks.
/wait
@@ -287,7 +291,7 @@ struct SaveRestoreContext { | |||
}; | |||
|
|||
// Create a new low-level WASM VM using runtime of the given type (e.g. "envoy.wasm.runtime.wavm"). | |||
WasmVmPtr createWasmVm(absl::string_view runtime); | |||
WasmVmPtr createWasmVm(absl::string_view runtime, Stats::ScopeSharedPtr scope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pass either by const-ref or by explicit &&
all the way through for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I checked the compiler output and there is no performance benefit to using std::move + && if the variable goes out of scope immediately as the compiler does copy elision. Since the pointer is shared there is not ownership being passed so I don't understand what you mean by "clarity" here as we typically pass with && to indicate that there is a passing of ownership which is not the case here. However, taking your suggestions:
Using && means that the caller will have to make an explicit copy if the variable is still in use after the call or std::move if it is not which clutters up the code and adds an extra burden as the caller who now has to manage the life cycle of what is logically a shared object. If the caller decides to take the easy way out and always make a copy, the code will be less efficient as the copy is explicit and can't be removed via copy elision by the compiler.
Switching to const-ref. That should produce equally efficient code and adds no burden to the caller, although because there is now non-local sharing (i.e. the variable in the caller is now aliased into the callees) it is nominally less safe (an indirect function call from the callee which causes the caller to clear or or replace that variable could effect the variable in the callee since it is a reference). Personally I think it would be better to pass by value here since it is less code, just as efficient and safer (no non-local sharing).
All that said, I switched it to const-ref since that seems to be the more common idiom in Envoy when dealing with shared objects and no passing of ownership.
PTAL
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Ping. This PR is largely independent of the Architecture doc and I have answered all the comments. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with small nits, thanks.
/wait
@@ -247,9 +249,28 @@ bool V8::load(const std::string& code, bool /* allow_precompiled */) { | |||
::memcpy(source_.get(), code.data(), code.size()); | |||
|
|||
module_ = wasm::Module::make(store_.get(), source_); | |||
if (module_) { | |||
shared_module_ = module_->share(); | |||
RELEASE_ASSERT(shared_module_ != nullptr, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in general we don't release ASSERT things like this since it shouldn't fail and will crash in obvious ways. Is there any reason to do this here and elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -7,6 +7,7 @@ | |||
#include "envoy/registry/registry.h" | |||
|
|||
#include "common/common/assert.h" | |||
#include "common/singleton/threadsafe_singleton.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed? Same elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Done.
Signed-off-by: John Plevyak <jplevyak@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Signed-off-by: John Plevyak jplevyak@gmail.com
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Description: Add stats to WASM VMs including active and number created. Make V8 VMs clonable. Make the stats objects (Scope and StatNameSet) shared and with a separate lifetime from VMs to minimize memory usage and so that we can share them over VM reloads. The new VM stats were added because the VMs are destroyed only when the listener drains and those stats are useful to see how many are still around.
Risk Level: Low
Testing: Unit
Docs Changes:
Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]