Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete pod-network-delay rule will be failure when the pod restart #485

Open
bmbbms opened this issue Mar 31, 2021 · 5 comments
Open

delete pod-network-delay rule will be failure when the pod restart #485

bmbbms opened this issue Mar 31, 2021 · 5 comments
Labels
type/bug Something isn't working

Comments

@bmbbms
Copy link

bmbbms commented Mar 31, 2021

Issue Description

bug report

Describe what happened (or what feature you want)

when i set a network delay rule for a pod, it make pod livness probe failed,and the pod will be restarted. at this time, if i want
to delete the network delay rules ,it will be failure ,because the containerId will be changed when the pod restart. actually the network delay rule continue using the origin containerId to delete the pod network delay.

Describe what you expected to happen

so the containerId is not good for the specified rules. we should theck the Identifier's containerId whether changed when delete failure

How to reproduce it (as minimally and precisely as possible)

  1. first deply a network delay for a pod
        Status:
          Exp Statuses:
            Action:  delay
            Res Statuses:
              Id:          b42b0ee218262ce9
              Identifier:  test-testing-dc-k2030/172.20.35.51/reliable-msg-route-5fdc8cc757-hwvdt/reliable-msg-route/18f0b9d032ce
              Kind:        pod
              State:       Success
              Success:     true
            Scope:         pod
            State:         Success
            Success:       true
            Target:        network
          Phase:           Running
        Events:            <none>
  1. make sure the delay can result in the pod live probe failed and restart
test-testing-dc-k2030         reliable-msg-route-5fdc8cc757-hwvdt               1/1     Running            4          3d      192.168.137.81    172.20.35.51   <none>           <none>
  1. delete the rule
Status:
  Exp Statuses:
    Action:  delay
    Error:   see resStatus for the error details
    Res Statuses:
      Error:       Error response from daemon: No such container: 18f0b9d032ce
      Id:          b42b0ee218262ce9
      Identifier:  test-testing-dc-k2030/172.20.35.51/reliable-msg-route-5fdc8cc757-hwvdt/reliable-msg-route/18f0b9d032ce
      Kind:        pod
      State:       Error
      Success:     false
    Scope:         pod
    State:         Success
    Success:       false
    Target:        network
  Phase:           Destroying
  1. if i delete the rule force,actually the delay rules still in the pod

Tell us your environment

k8s v1.16.15
chaosblade-operator-v0.9.0

Anything else we need to know?

@xcaspar xcaspar added the type/bug Something isn't working label Mar 31, 2021
@xcaspar
Copy link
Member

xcaspar commented Mar 31, 2021

You can set --daemonset-enable=false flag to close sidecar model when deploying chaosblade-operator to solve the problem.

@bmbbms
Copy link
Author

bmbbms commented Mar 31, 2021

i see the default value of this parm is false.

@xcaspar
Copy link
Member

xcaspar commented Mar 31, 2021

You can delete the pod to recover it. I will solve this problem later.

@bmbbms
Copy link
Author

bmbbms commented Mar 31, 2021

actually it will work well when i apply the rule again using --force ,and i will success delete the rule before the pod next restarting . but i think it not a perfect way for doing
that,so i report the bug.

@yzhang559
Copy link

@xcaspar
I am using chaosblade-operator-v1.3.0 and k8s v1.21.4, still faced with this issue.
Would there be any fix on next release or is there any work around to bypass this issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants