keep ApiState pause/resume ref-counted, to make sure resume on the state is always called (always in pair). #11674

stevenzzzz · 2020-06-20T19:29:33Z

per Harvey, let's make the ApiState pause/resume operations always come in pair.
Currently there is no guarantee on when/whether the "resume" will be called after the ApiState has been "pause"d.

We should consider mechanism like RAII object which auto-resume the paused resource type when the object died.

stevenzzzz · 2020-06-23T15:06:01Z

/assign stevenzzzz

stevenzzzz · 2020-06-29T21:36:51Z

PR #11739 made an improvement on returning a Cleanup instance which resumes the paused resources on destruction.
This work well with most of the callsites in our codebase, i.e., the pause/resume happens in pair in a single main thread dispatcher event.
The only exception is in ClusterManagerImpl::updateClusterCount, this method is called by multiple functions in cluster manager impl, and the pause-resume may span across several main thread dispatcher events[1]. This is not ideal, as it is hard to tell when the pause happens and when will it be resumed. also a possible stuck eds response could stop CDS from been continued. we will sort out the right move there and see if we can achieve the "resume when out of pause scope" goal for all Pause callsites.

[1] https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/cluster_manager_impl.cc#L806-L808

htuch · 2020-07-13T23:10:25Z

I'll take this on in the context of #11877. I realized that we need to pause/resume the type URL undergoing update there, which means we definitely need ref counting (since we have existing pause/resumes nested inside config updates).

To fix envoyproxy#11877, we need to handle safely the case where two watches point at the same resource, and a WatchMap onConfigUpdate() causes one watch to remove the other watch during its invoked onConfigUpdate(). While working on this, it made sense to fix envoyproxy#11674, avoiding spurious ClusterLoadAssignment discovery requests in the regression integration test. Risk level: Medium (this has xDS wire-level implications). Testing: New unit tests for pause/resume, regression unit and integration tests for watch map removal behaviors. Fixes envoyproxy#11877 envoyproxy#11674 Signed-off-by: Harvey Tuch <htuch@google.com> Co-authored-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com> Signed-off-by: Harvey Tuch <htuch@google.com>

…#12069) To fix #11877, we need to handle safely the case where two watches point at the same resource, and a WatchMap onConfigUpdate() causes one watch to remove the other watch during its invoked onConfigUpdate(). While working on this, it made sense to fix #11674, avoiding spurious ClusterLoadAssignment discovery requests in the regression integration test. Risk level: Medium (this has xDS wire-level implications). Testing: New unit tests for pause/resume, regression unit and integration tests for watch map removal behaviors. Fixes #11877 #11674 Signed-off-by: Harvey Tuch <htuch@google.com> Co-authored-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

…envoyproxy#12069) To fix envoyproxy#11877, we need to handle safely the case where two watches point at the same resource, and a WatchMap onConfigUpdate() causes one watch to remove the other watch during its invoked onConfigUpdate(). While working on this, it made sense to fix envoyproxy#11674, avoiding spurious ClusterLoadAssignment discovery requests in the regression integration test. Risk level: Medium (this has xDS wire-level implications). Testing: New unit tests for pause/resume, regression unit and integration tests for watch map removal behaviors. Fixes envoyproxy#11877 envoyproxy#11674 Signed-off-by: Harvey Tuch <htuch@google.com> Co-authored-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com> Signed-off-by: Kevin Baichoo <kbaichoo@google.com>

…envoyproxy#12069) To fix envoyproxy#11877, we need to handle safely the case where two watches point at the same resource, and a WatchMap onConfigUpdate() causes one watch to remove the other watch during its invoked onConfigUpdate(). While working on this, it made sense to fix envoyproxy#11674, avoiding spurious ClusterLoadAssignment discovery requests in the regression integration test. Risk level: Medium (this has xDS wire-level implications). Testing: New unit tests for pause/resume, regression unit and integration tests for watch map removal behaviors. Fixes envoyproxy#11877 envoyproxy#11674 Signed-off-by: Harvey Tuch <htuch@google.com> Co-authored-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com> Signed-off-by: chaoqinli <chaoqinli@google.com>

stevenzzzz mentioned this issue Jun 20, 2020

ads: SRDS initialization not pausing RDS #11645

Closed

mattklein123 added area/xds help wanted Needs help! labels Jun 22, 2020

repokitteh-read-only bot assigned stevenzzzz Jun 23, 2020

stevenzzzz mentioned this issue Jun 24, 2020

make grpcmux pause/resume come in pair by returning a RAII obj which resumes requests on destruction. #11739

Merged

htuch removed the help wanted Needs help! label Jul 13, 2020

htuch assigned htuch and unassigned stevenzzzz Jul 13, 2020

htuch mentioned this issue Jul 14, 2020

xds: safely handle watch removals during update, nested pause/resume. #12069

Merged

htuch closed this as completed in #12069 Jul 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keep ApiState pause/resume ref-counted, to make sure resume on the state is always called (always in pair). #11674

keep ApiState pause/resume ref-counted, to make sure resume on the state is always called (always in pair). #11674

stevenzzzz commented Jun 20, 2020

stevenzzzz commented Jun 23, 2020

stevenzzzz commented Jun 29, 2020

htuch commented Jul 13, 2020

keep ApiState pause/resume ref-counted, to make sure resume on the state is always called (always in pair). #11674

keep ApiState pause/resume ref-counted, to make sure resume on the state is always called (always in pair). #11674

Comments

stevenzzzz commented Jun 20, 2020

stevenzzzz commented Jun 23, 2020

stevenzzzz commented Jun 29, 2020

htuch commented Jul 13, 2020