Network-loss tests do not work with minikube #979

ksatchit · 2019-12-04T14:19:07Z

What happened:

Running a pod network loss experiment (and in all probability the network-delay test) on minikube is not seen to inject the desired chaos. Thanks to @LaumiH for discovering this.

This was observed with the following versions:

Minikube v1.2, Docker: 18.09.06 / K8s: 1.15.0
Minikube v1.5.2, Docker 18.09.9 / K8s: 1.16.2

The test involved setting up a ping to general/public IPs from inside the pod & also setting up a ping to the pod IP itself from a cluster node.

What you expected to happen:

The network chaos should be injected successfully.

How to reproduce it (as minimally and precisely as possible):

Setup a sample deployment and run the litmuschaos pod-network-loss experiment. Sample YAMLs are provided in the comments.

Anything else we need to know?:

These tests run successfully with said docker/k8s versions on non-minikube clusters (kubeadm, for example and older GKE clusters)

ksatchit · 2019-12-04T14:23:11Z

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: busy1
  name: busy1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busy1
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: busy1
    spec:
      containers:
        - image: debian
          name: busy-1
          command: [ "/bin/bash", "-c", "sleep 10000;exit 0" ]

ChaosEngine

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine
  namespace: default
spec:
  jobCleanUpPolicy: delete
  monitoring: false
  appinfo:
    # app namespace
    appns: default
    # to see app label, apply kubectl get pods --show-labels
    applabel: 'app=busy1'
    # supported kinds: deployment, statefulset
    appkind: deployment
  chaosServiceAccount: 'nginx'
  experiments:
    - name: pod-network-loss
      spec:
        components:
        - name: TARGET_CONTAINER
          value: 'busy-1'
        - name: NETWORK_INTERFACE
          value: 'eth0'

LaumiH · 2019-12-04T14:25:13Z

I opened an issue in the pumba repo, there it is also explained in detail what was tested yet.

The pumba netem itself runs successfully on my laptop with docker 19.

LaumiH · 2019-12-04T18:50:22Z

I have a small achievement: someone having the same problem in a pumba chat said that minikube might be missing the needed kernel module sch_netem. If I get a shell into the debian container inside minikube and run the bare netem command tc qdisc add dev eth0 root netem loss random 100, I get RTNETLINK answers: Operation not permitted as an answer.
I will continue looking into this. Do not expect any help from pumba developer, as he closed my issue saying it has something to do with virtualbox, so he can't help.

LaumiH · 2019-12-04T18:55:02Z

I will try experimenting with running containers in privileged mode on Monday, let's see what this gives me. It seems the kernel module is there but the container does not have the privileges to execute the netem command.

LaumiH · 2019-12-04T19:15:25Z

I now ran the container with

securityContext:
        privileged: true

and get Error: Specified qdisc not found.. Got the idea from here.
At least something else happens 🤣 .

Running the container in privileged mode changed nothing for pumba, it still does not work.

LaumiH · 2019-12-04T19:41:30Z

Okay, I kept searching.
Ip addr says eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default . The qdisc noqueue part seemed interesting, so I googled a bit and found that it is the default for virtual devices (link). Maybe this is the reason. I just want to document what I am doing as to not forget until next week ^^.

Edit: I can run a tc -d qdisc show dev eth0 and the eth0 interface is found and gives qdisc noqueue 0: root refcnt 2 . Hm.

LaumiH · 2019-12-04T20:21:36Z

Maybe the netem kernel module is really missing. tc qdisc add dev eth0 root pfifo_fast works, but anything with netem fails, saying it is an invalid qdisc name. Maybe someone knows more than me.

LaumiH · 2019-12-09T21:27:50Z

I need to test it a bit further as it only works with priviledged containers, but for now my PR should fix the issue.

LaumiH · 2019-12-14T15:12:20Z

Minikube got patched with the missing kernel module, use version >= 1.6.0, released on 2019-12-10. The PR #991 from @ksatchit also has the effect that containers don't have to be privileged in order for netem to work, as netem is now executed on a seperate container. As far as I tested it, minikube now has no further limitations in netem related experiments, so this issue can be closed.

ksatchit added area/litmus kind/bug labels Dec 4, 2019

ksatchit changed the title ~~Network-loss tests do not work with minkube~~ Network-loss tests do not work with minikube Dec 4, 2019

LaumiH mentioned this issue Dec 9, 2019

sch_netem kernel module missing for network emulation (tc netem) kubernetes/minikube#6033

Closed

ksatchit closed this as completed Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network-loss tests do not work with minikube #979

Network-loss tests do not work with minikube #979

ksatchit commented Dec 4, 2019 •

edited

Loading

ksatchit commented Dec 4, 2019 •

edited

Loading

LaumiH commented Dec 4, 2019

LaumiH commented Dec 4, 2019

LaumiH commented Dec 4, 2019

LaumiH commented Dec 4, 2019 •

edited

Loading

LaumiH commented Dec 4, 2019 •

edited

Loading

LaumiH commented Dec 4, 2019

LaumiH commented Dec 9, 2019

LaumiH commented Dec 14, 2019

Network-loss tests do not work with minikube #979

Network-loss tests do not work with minikube #979

Comments

ksatchit commented Dec 4, 2019 • edited Loading

ksatchit commented Dec 4, 2019 • edited Loading

LaumiH commented Dec 4, 2019

LaumiH commented Dec 4, 2019

LaumiH commented Dec 4, 2019

LaumiH commented Dec 4, 2019 • edited Loading

LaumiH commented Dec 4, 2019 • edited Loading

LaumiH commented Dec 4, 2019

LaumiH commented Dec 9, 2019

LaumiH commented Dec 14, 2019

ksatchit commented Dec 4, 2019 •

edited

Loading

ksatchit commented Dec 4, 2019 •

edited

Loading

LaumiH commented Dec 4, 2019 •

edited

Loading

LaumiH commented Dec 4, 2019 •

edited

Loading