Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(debug): try reproducing es flake #11555

Closed
wants to merge 3 commits into from
Closed

Conversation

slonka
Copy link
Contributor

@slonka slonka commented Sep 25, 2024

found one problem after 66 runs but not the same one as reported by the flake:

[SynchronizedBeforeSuite]
github.com/kumahq/kuma/test/framework/ginkgo.go:57
 2024-09-25T10:47:45+02:00 logger.go:66: Running command kubectl with args [--kubeconfig /Users/krzysztof.slonka@konghq.com/.kube/kind-kuma-1-config apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/experimental-install.yaml]
 2024-09-25T10:48:20+02:00 logger.go:66: Unable to connect to the server: net/http: TLS handshake timeout
  [FAILED] in [SynchronizedBeforeSuite] - github.com/kumahq/kuma/test/framework/envs/kubernetes/env.go:25 @ 09/25/24 10:48:20.115
[SynchronizedBeforeSuite] [FAILED] [34.225 seconds]
[SynchronizedBeforeSuite]
github.com/kumahq/kuma/test/framework/ginkgo.go:57

  [FAILED] Expected success, but got an error:
      <*shell.ErrWithCmdOutput | 0x14000b06be8>:
      error while running command: exit status 1; Unable to connect to the server: net/http: TLS handshake timeout
      {
          Underlying: <*exec.ExitError | 0x14000ce24e0>{
              ProcessState: {
                  pid: 29623,
                  status: 256,
                  rusage: {
                      Utime: {
                          Sec: 0,
                          Usec: 114257,
                          Pad_cgo_0: [0, 0, 0, 0],
                      },
                      Stime: {
                          Sec: 0,
                          Usec: 69645,
                          Pad_cgo_0: [0, 0, 0, 0],
                      },
                      Maxrss: 98582528,
                      Ixrss: 0,
                      Idrss: 0,
                      Isrss: 0,
                      Minflt: 6877,
                      Majflt: 0,
                      Nswap: 0,
                      Inblock: 0,
                      Oublock: 0,
                      Msgsnd: 13,
                      Msgrcv: 11,
                      Nsignals: 182,
                      Nvcsw: 98,
                      Nivcsw: 9049,
                  },
              },
              Stderr: nil,
          },
          Output: {
              stdout: {
                  Lines: nil,
                  merged: {
                      Mutex: {state: 0, sema: 0},
                      Lines: [
                          "Unable to connect to the server: net/http: TLS handshake timeout",
                      ],
                  },
              },
              stderr: {
                  Lines: [
                      "Unable to connect to the server: net/http: TLS handshake timeout",
                  ],
                  merged: {
                      Mutex: {state: 0, sema: 0},
                      Lines: [
                          "Unable to connect to the server: net/http: TLS handshake timeout",
                      ],
                  },
              },
              merged: {
                  Mutex: {state: 0, sema: 0},
                  Lines: [
                      "Unable to connect to the server: net/http: TLS handshake timeout",
                  ],
              },
          },
      }
  In [SynchronizedBeforeSuite] at: github.com/kumahq/kuma/test/framework/envs/kubernetes/env.go:25 @ 09/25/24 10:48:20.115
------------------------------
[SynchronizedAfterSuite]
github.com/kumahq/kuma/test/e2e_env/kubernetes/kubernetes_suite_test.go:51
  [PANICKED] in [SynchronizedAfterSuite] - runtime/panic.go:262 @ 09/25/24 10:48:20.116
[SynchronizedAfterSuite] [PANICKED] [0.000 seconds]
[SynchronizedAfterSuite]
github.com/kumahq/kuma/test/e2e_env/kubernetes/kubernetes_suite_test.go:51

  [PANICKED] Test Panicked
  In [SynchronizedAfterSuite] at: runtime/panic.go:262 @ 09/25/24 10:48:20.116

  runtime error: invalid memory address or nil pointer dereference

  Full Stack Trace
    github.com/kumahq/kuma/test/framework.(*K8sControlPlane).GetKubectlOptions(...)
        github.com/kumahq/kuma/test/framework/k8s_controlplane.go:71
    github.com/kumahq/kuma/test/framework.(*K8sControlPlane).GetKumaCPPods(0x0)
        github.com/kumahq/kuma/test/framework/k8s_controlplane.go:107 +0x58
    github.com/kumahq/kuma/test/framework.CpRestarted({0x106186db8?, 0x14000e84000?})
        github.com/kumahq/kuma/test/framework/debug.go:343 +0x74
    github.com/kumahq/kuma/test/framework/envs/kubernetes.SynchronizedAfterSuite()
        github.com/kumahq/kuma/test/framework/envs/kubernetes/env.go:115 +0x2c
------------------------------
[ReportAfterSuite] kubernetes after suite
github.com/kumahq/kuma/test/e2e_env/kubernetes/kubernetes_suite_test.go:52
  [PANICKED] in [ReportAfterSuite] - runtime/panic.go:262 @ 09/25/24 10:48:20.116
[ReportAfterSuite] [PANICKED] [0.000 seconds]
[ReportAfterSuite] kubernetes after suite
github.com/kumahq/kuma/test/e2e_env/kubernetes/kubernetes_suite_test.go:52

  [PANICKED] Test Panicked
  In [ReportAfterSuite] at: runtime/panic.go:262 @ 09/25/24 10:48:20.116

  runtime error: invalid memory address or nil pointer dereference

  Full Stack Trace
    github.com/kumahq/kuma/test/framework.(*K8sControlPlane).GetKubectlOptions(...)
        github.com/kumahq/kuma/test/framework/k8s_controlplane.go:71
    github.com/kumahq/kuma/test/framework.(*K8sControlPlane).GetKumaCPPods(0x0)
        github.com/kumahq/kuma/test/framework/k8s_controlplane.go:107 +0x58
    github.com/kumahq/kuma/test/framework.(*K8sCluster).GetKumaCPLogs(0x14000e84000)
        github.com/kumahq/kuma/test/framework/k8s_cluster.go:839 +0x30
    github.com/kumahq/kuma/test/framework/envs/kubernetes.PrintCPLogsOnFailure({{0x14000080194, 0x48}, {0x1052614f0, 0x14}, {0x108299ac0, 0x0, 0x0}, 0x0, 0x1, {0x0, ...}, ...})
        github.com/kumahq/kuma/test/framework/envs/kubernetes/env.go:87 +0x2c
    github.com/kumahq/kuma/test/framework/envs/kubernetes.AfterSuite({{0x14000080194, 0x48}, {0x1052614f0, 0x14}, {0x108299ac0, 0x0, 0x0}, 0x0, 0x1, {0x0, ...}, ...})
        github.com/kumahq/kuma/test/framework/envs/kubernetes/env.go:120 +0x48
------------------------------
[ReportAfterSuite] Autogenerated ReportAfterSuite for --junit-report
autogenerated by Ginkgo
[ReportAfterSuite] PASSED [0.001 seconds]
------------------------------

Summarizing 3 Failures:
  [FAIL] [SynchronizedBeforeSuite]
  github.com/kumahq/kuma/test/framework/envs/kubernetes/env.go:25
  [PANICKED!] [SynchronizedAfterSuite]
  runtime/panic.go:262
  [PANICKED!] [ReportAfterSuite] kubernetes after suite
  runtime/panic.go:262

Ran 0 of 159 Specs in 34.226 seconds
FAIL! -- A BeforeSuite node failed so all tests were skipped.
--- FAIL: TestE2E (34.23s)
FAIL

Tests failed on attempt #67


Checklist prior to review

  • Link to relevant issue as well as docs and UI issues --
  • This will not break child repos: it doesn't hardcode values (.e.g "kumahq" as a image registry) and it will work on Windows, system specific functions like syscall.Mkfifo have equivalent implementation on the other OS --
  • Tests (Unit test, E2E tests, manual test on universal and k8s) --
    • Don't forget ci/ labels to run additional/fewer tests
  • Do you need to update UPGRADE.md? --
  • Does it need to be backported according to the backporting policy? (this GH action will add "backport" label based on these file globs, if you want to prevent it from adding the "backport" label use no-backport-autolabel label) --

@slonka slonka added ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change) and removed ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change) labels Sep 25, 2024
@slonka
Copy link
Contributor Author

slonka commented Sep 25, 2024

Another one after 142 runs:

• [TIMEDOUT] [7.528 seconds]
External Services [BeforeAll] non-TLS should route to external-service
  [BeforeAll] github.com/kumahq/kuma/test/e2e_env/kubernetes/externalservices/externalservices.go:44
  [It] github.com/kumahq/kuma/test/e2e_env/kubernetes/externalservices/externalservices.go:102

  [TIMEDOUT] A suite timeout occurred
  In [BeforeAll] at: github.com/kumahq/kuma/test/e2e_env/kubernetes/externalservices/externalservices.go:44 @ 09/25/24 12:14:01.022

  This is the Progress Report generated when the suite timeout occurred:
    External Services non-TLS should route to external-service (Spec Runtime: 1.458s)
      github.com/kumahq/kuma/test/e2e_env/kubernetes/externalservices/externalservices.go:102
      In [BeforeAll] (Node Runtime: 1.458s)
        github.com/kumahq/kuma/test/e2e_env/kubernetes/externalservices/externalservices.go:44

      Spec Goroutine
      goroutine 124 [sleep]
        time.Sleep(0xb2d05e00)
          runtime/time.go:285
        github.com/gruntwork-io/terratest/modules/retry.DoWithRetryInterfaceE({0x107902730, 0x14000efa008}, {0x140017b84c0, 0x33}, 0x1e, 0xb2d05e00, 0x140012599b8)
          github.com/gruntwork-io/terratest@v0.47.1/modules/retry/retry.go:104
        github.com/gruntwork-io/terratest/modules/retry.DoWithRetryE({0x107902730?, 0x14000efa008?}, {0x140017b84c0?, 0x1?}, 0x1?, 0x10711d9a0?, 0x140012ac0a0?)
          github.com/gruntwork-io/terratest@v0.47.1/modules/retry/retry.go:68
        github.com/gruntwork-io/terratest/modules/k8s.WaitUntilNumPodsCreatedE({0x107902730, 0x14000efa008}, 0x14000aaa000, {{{0x0, 0x0}, {0x0, 0x0}}, {0x14000d90030, 0xf}, {0x0, ...}, ...}, ...)
          github.com/gruntwork-io/terratest@v0.47.1/modules/k8s/pod.go:81
        github.com/gruntwork-io/terratest/modules/k8s.WaitUntilNumPodsCreated({0x107902730, 0x14000efa008}, 0x0?, {{{0x0, 0x0}, {0x0, 0x0}}, {0x14000d90030, 0xf}, {0x0, ...}, ...}, ...)
          github.com/gruntwork-io/terratest@v0.47.1/modules/k8s/pod.go:67
        github.com/kumahq/kuma/test/framework/deployments/democlient.(*k8SDeployment).Deploy.WaitNumPods.func3({0x107916db8, 0x14000b39200})
          github.com/kumahq/kuma/test/framework/setup.go:544
        github.com/kumahq/kuma/test/framework/deployments/democlient.(*k8SDeployment).Deploy.Combine.func4(...)
          github.com/kumahq/kuma/test/framework/setup.go:916
        github.com/kumahq/kuma/test/framework/deployments/democlient.(*k8SDeployment).Deploy(0x14000094af0, {0x107916db8, 0x14000b39200})
          github.com/kumahq/kuma/test/framework/deployments/democlient/kubernetes.go:128
        github.com/kumahq/kuma/test/framework.(*K8sCluster).Deploy(0x14000b39200, {0x130e369c0, 0x14000094af0})
          github.com/kumahq/kuma/test/framework/k8s_cluster.go:1157
        github.com/kumahq/kuma/test/framework/deployments/democlient.Install.func1({0x107916db8, 0x14000b39200})
          github.com/kumahq/kuma/test/framework/deployments/democlient/deployment.go:96
        github.com/kumahq/kuma/test/framework.(*ClusterSetup).Setup.Combine.func1(...)
          github.com/kumahq/kuma/test/framework/setup.go:916
        github.com/kumahq/kuma/test/framework.(*ClusterSetup).Setup(0x14000f5ef48?, {0x107916db8, 0x14000b39200})
          github.com/kumahq/kuma/test/framework/setup.go:963
      > github.com/kumahq/kuma/test/e2e_env/kubernetes/externalservices.ExternalServices.func1()
          github.com/kumahq/kuma/test/e2e_env/kubernetes/externalservices/externalservices.go:50
        github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3({0x1400007b3e0?, 0x140001028c0?})
          github.com/onsi/ginkgo/v2@v2.20.2/internal/node.go:472
        github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3()
          github.com/onsi/ginkgo/v2@v2.20.2/internal/suite.go:894
        github.com/onsi/ginkgo/v2/internal.(*Suite).runNode in goroutine 48
          github.com/onsi/ginkgo/v2@v2.20.2/internal/suite.go:881
-

but it's different, pod did not come up.

@slonka
Copy link
Contributor Author

slonka commented Sep 25, 2024

on CI it succeeded 65 times in a row:

All tests passed...
Will keep running them until they fail.
This was attempt #65
No, seriously... you can probably stop now.

@slonka slonka closed this Sep 26, 2024
@slonka slonka reopened this Sep 26, 2024
@slonka slonka force-pushed the reproduce-es-flake branch 3 times, most recently from e3dbd33 to e11cbfb Compare September 26, 2024 12:30
@slonka slonka closed this Sep 26, 2024
@slonka slonka reopened this Sep 26, 2024
Signed-off-by: slonka <slonka@users.noreply.github.com>
@slonka
Copy link
Contributor Author

slonka commented Oct 2, 2024

Timeline:
Cluster added:
[2024-09-30 10:29:20.714][16][debug][upstream] [source/common/upstream/upstream_impl.cc:1689] initializing Primary cluster external-services:external-service completed
Cluster removed (?):
[2024-09-30 10:29:21.686][16][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:859] removing cluster external-services:external-service
First request to es-eses:
[2024-09-30 10:29:21.775][30][debug][http] [source/common/http/conn_manager_impl.cc:1147] [Tags: "ConnectionId":"241","StreamId":"14271450548453418764"] request headers complete (end_stream=true):
':authority', 'external-service.external-services'
Response with 403:
[2024-09-30 10:29:21.775][30][debug][http] [source/common/http/conn_manager_impl.cc:1838] [Tags: "ConnectionId":"241","StreamId":"14271450548453418764"] encoding headers via codec (end_stream=false):
':status', '403'
Traffic Permission created by test:
2024-09-30T10:29:22Z logger.go:66: trafficpermission.kuma.io/traffic-to-es created
TP received by CP:
2024-09-30T10:29:22.042Z LEVEL(-5) admission received request {"object": {"name":"traffic-to-es"}, "namespace": "", "name": "traffic-to-es", "resource": {"group":"kuma.io","version":"v1alpha1","resource":"trafficpermissions"}, "user": "system:admin", "requestID": "4237d979-bbd9-4833-9501-8717b809e976"}
TP received by XDS:
[2024-09-30 10:29:22.647][16][debug][config] [source/extensions/config_subscription/grpc/grpc_mux_impl.cc:268] Received gRPC message for type.googleapis.com/envoy.config.listener.v3.Listener at version 74bbc779-3ff0-4f8a-a1c9-90720c6cd7ee
New request allowed:
[2024-09-30 10:29:24.695][29][debug][rbac] [source/extensions/filters/network/rbac/rbac_filter.cc:156] enforced allowed, matched policy traffic-to-es
503s later because no cluster
Assertion failure:
Error: It 09/30/24 10:29:52.058

To me it looks the same as the initial_timeout bug

@slonka slonka closed this Oct 2, 2024
@lahabana lahabana deleted the reproduce-es-flake branch October 28, 2024 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant