Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stats to WASM VMs. #9259

Merged
merged 14 commits into from
Jan 10, 2020
Merged

Conversation

jplevyak
Copy link
Contributor

@jplevyak jplevyak commented Dec 6, 2019

Signed-off-by: John Plevyak jplevyak@gmail.com

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Description: Add stats to WASM VMs including active and number created. Make V8 VMs clonable. Make the stats objects (Scope and StatNameSet) shared and with a separate lifetime from VMs to minimize memory usage and so that we can share them over VM reloads. The new VM stats were added because the VMs are destroyed only when the listener drains and those stats are useful to see how many are still around.
Risk Level: Low
Testing: Unit
Docs Changes:
Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]

Signed-off-by: John Plevyak <jplevyak@gmail.com>
@htuch
Copy link
Member

htuch commented Dec 10, 2019

@jmarantz can you take a pass on this, given your deep stats knowledge and recent discussions with @jplevyak on this subject?

@jmarantz jmarantz self-assigned this Dec 11, 2019
@jmarantz
Copy link
Contributor

hi -- can you give some more context to this PR? I'd like to understand better the relationship between threads & VM instances, and what exactly is the semantics when you clone a VM.

Thanks!

/wait

@jplevyak
Copy link
Contributor Author

As per our discussions, the stats are going to be initialized infrequently should outlive the restarted VMs so that creation costs can be amortized and so that they do not get reset (to zero for counters). This PR decouples the lifetime of stats from listeners and VM and also adds individual stats for active VMs and created VMs providing insight into VM life cycles. A coming PR will use these mechanisms to add stats for VM startup issues as the current xDS configuration mechanism which is used for all filter/listener configurations (not just WASM) makes a configuration active even if there are issues (e.g. exceptions) and these stats will improve visibility of configuration issues (i.e. they can be be alerted on).

@jplevyak
Copy link
Contributor Author

@mandarjog and @kyessenov have context on this issue

@kyessenov
Copy link
Contributor

The particular issue we were debugging was accumulation of Wasm VMs attached to draining listeners. Since draining listeners are removed on a timer basis, we may have quite a number of VMs alive at any point (including their onTick callbacks). A counter on the number of active VMs would help us here.

@jmarantz
Copy link
Contributor

jmarantz commented Dec 11, 2019

very cool. The comments from @jplevyak and @kyessenov would both be appropriate for the PR description.

@jplevyak
Copy link
Contributor Author

Updated PR description. PTAL

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, mostly around style nits and doc.

There's a bunch going on here that I think @lizan will need to review, or someone else that's read the book of knowledge of wasm :)

bazel/external/wee8.genrule_cmd Show resolved Hide resolved
source/extensions/common/wasm/null/null_vm.cc Outdated Show resolved Hide resolved
source/extensions/common/wasm/null/null_vm.h Outdated Show resolved Hide resolved
source/extensions/common/wasm/wasm_vm.h Show resolved Hide resolved
source/extensions/common/wasm/wasm_vm_base.h Outdated Show resolved Hide resolved
source/extensions/common/wasm/wasm_vm_base.h Outdated Show resolved Hide resolved
source/extensions/common/wasm/wasm_vm_base.h Outdated Show resolved Hide resolved
source/extensions/common/wasm/wasm_vm_base.h Outdated Show resolved Hide resolved
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
jmarantz
jmarantz previously approved these changes Dec 13, 2019
source/extensions/common/wasm/wasm_vm_base.h Outdated Show resolved Hide resolved
source/extensions/common/wasm/wasm_vm_base.h Outdated Show resolved Hide resolved
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some small comments, thanks.

/wait

source/extensions/common/wasm/wasm_vm.h Outdated Show resolved Hide resolved
@@ -287,7 +291,7 @@ struct SaveRestoreContext {
};

// Create a new low-level WASM VM using runtime of the given type (e.g. "envoy.wasm.runtime.wavm").
WasmVmPtr createWasmVm(absl::string_view runtime);
WasmVmPtr createWasmVm(absl::string_view runtime, Stats::ScopeSharedPtr scope);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pass either by const-ref or by explicit && all the way through for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I checked the compiler output and there is no performance benefit to using std::move + && if the variable goes out of scope immediately as the compiler does copy elision. Since the pointer is shared there is not ownership being passed so I don't understand what you mean by "clarity" here as we typically pass with && to indicate that there is a passing of ownership which is not the case here. However, taking your suggestions:

Using && means that the caller will have to make an explicit copy if the variable is still in use after the call or std::move if it is not which clutters up the code and adds an extra burden as the caller who now has to manage the life cycle of what is logically a shared object. If the caller decides to take the easy way out and always make a copy, the code will be less efficient as the copy is explicit and can't be removed via copy elision by the compiler.

Switching to const-ref. That should produce equally efficient code and adds no burden to the caller, although because there is now non-local sharing (i.e. the variable in the caller is now aliased into the callees) it is nominally less safe (an indirect function call from the callee which causes the caller to clear or or replace that variable could effect the variable in the callee since it is a reference). Personally I think it would be better to pass by value here since it is less code, just as efficient and safer (no non-local sharing).

All that said, I switched it to const-ref since that seems to be the more common idiom in Envoy when dealing with shared objects and no passing of ownership.

PTAL

source/extensions/common/wasm/wasm_vm_base.h Outdated Show resolved Hide resolved
source/extensions/common/wasm/wasm_vm_base.h Outdated Show resolved Hide resolved
source/extensions/common/wasm/wasm_vm_base.h Outdated Show resolved Hide resolved
source/extensions/common/wasm/wasm_vm_base.h Show resolved Hide resolved
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
Signed-off-by: John Plevyak <jplevyak@gmail.com>
@stale
Copy link

stale bot commented Jan 4, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 4, 2020
@jplevyak
Copy link
Contributor Author

jplevyak commented Jan 6, 2020

Ping. This PR is largely independent of the Architecture doc and I have answered all the comments.

PTAL

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jan 6, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small nits, thanks.

/wait

@@ -247,9 +249,28 @@ bool V8::load(const std::string& code, bool /* allow_precompiled */) {
::memcpy(source_.get(), code.data(), code.size());

module_ = wasm::Module::make(store_.get(), source_);
if (module_) {
shared_module_ = module_->share();
RELEASE_ASSERT(shared_module_ != nullptr, "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in general we don't release ASSERT things like this since it shouldn't fail and will crash in obvious ways. Is there any reason to do this here and elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -7,6 +7,7 @@
#include "envoy/registry/registry.h"

#include "common/common/assert.h"
#include "common/singleton/threadsafe_singleton.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed? Same elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Done.

Signed-off-by: John Plevyak <jplevyak@gmail.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 079af3a into envoyproxy:master Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants