Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cds_integration_test is flakey #30107

Closed
phlax opened this issue Oct 11, 2023 · 7 comments · Fixed by #30269
Closed

cds_integration_test is flakey #30107

phlax opened this issue Oct 11, 2023 · 7 comments · Fixed by #30269

Comments

@phlax
Copy link
Member

phlax commented Oct 11, 2023

Fail can be seen here https://dev.azure.com/cncf/envoy/_build/results?buildId=151916&view=logs&j=d1f76054-8f79-554b-6f4a-11d6a63b8e00&t=a193292f-96b1-53c3-0505-a923ddcc3f84&l=304

test/integration/cds_integration_test.cc:284: Failure
Value of: response->complete()
  Actual: false
Expected: true
Stack trace:
  0x3549039: Envoy::(anonymous namespace)::DeferredCreationClusterStatsTest::sendRequestToClusterAndWaitForResponse()
  0x3553b01: Envoy::(anonymous namespace)::DeferredCreationClusterStatsTest_NonDeferredCreationTrafficStatsWithClusterCreateDeleteRecrete_Test::TestBody()
  0x3553f4a: Envoy::(anonymous namespace)::DeferredCreationClusterStatsTest_NonDeferredCreationTrafficStatsWithClusterCreateDeleteRecrete_Test::TestBody()
  0xb1aa638: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
  0xb19007b: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0xb173829: testing::Test::Run()
  0xb174662: testing::TestInfo::Run()
... Google Test internal frames ...

Its happened a couple of times on main in the last couple of days

the first occurrence i can see at least is early september

the most recent changes to related/files to then is #28702

@phlax
Copy link
Member Author

phlax commented Oct 11, 2023

cc @KBaichoo

@phlax phlax changed the title cds_integration_test is flakey (tsan) cds_integration_test is flakey Oct 11, 2023
@phlax
Copy link
Member Author

phlax commented Oct 11, 2023

seems its not just tsan - failing here in c-t-o

[ RUN      ] IpVersionsClientTypeDelta/DeferredCreationClusterStatsTest.DeferredCreationTrafficStatsWithClusterCreateUpdateDelete/3
external/envoy/test/integration/cds_integration_test.cc:334: Failure
Expected equality of these values:
  test_server_->gauge("cluster.cluster_1.ClusterTrafficStats.initialized")
    Which is: 1
  nullptr
    Which is: (nullptr)

https://dev.azure.com/cncf/envoy/_build/results?buildId=152048&view=logs&j=e969334a-0e55-5c18-ac96-8b546753391e&t=32392586-69f5-5768-4caa-e00e8d4cc47e&l=235

not sure if its a different issue - same test failing

@KBaichoo
Copy link
Contributor

Thanks for raising phlax, I'll assign myself and take a look

@KBaichoo KBaichoo self-assigned this Oct 11, 2023
@phlax phlax added this to the 1.28.0 milestone Oct 13, 2023
@phlax
Copy link
Member Author

phlax commented Oct 13, 2023

@KBaichoo ive optimistically added to 1.28 milestone - would be good to resolve before release sets sail

@phlax
Copy link
Member Author

phlax commented Oct 16, 2023

cc @alyssawilk

@KBaichoo
Copy link
Contributor

I've looked into this a bit, and imo doesn't seem related to #28702 as it fails in cases where that added feature is off. I think the issue is we can race whether the counter exists when the cluster is remove or not and there's no "counter does not exist mechanism"

@phlax
Copy link
Member Author

phlax commented Oct 17, 2023

... imo doesn't seem related to #28702

yeah its possible it goes back further - i think it does - there was just limited info to track it with, so looked for related activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants