Network emulation is the process of imitating certain aspects of the behavior of network equipment without actually using any target real-world networking hardware. While there are many different use cases for network emulation one of the more interesting ones is virtual testing. Virtual testing is the simulation of a physical test environment. In this context it means removing the bottleneck of dedicated hardware for developers by executing integration tests against virtual instances of a production environment to pre-validate hardware tests. This frees up valuable and often limited hardware resources as certain testing scopes and most parts of the development process can be offloaded to virtual instances. Additionally, virtual instances often provide an easier to use environment that can be changed rapidly to represent any production topology without anyone needing to physically setup the network.
Kubernetes Network Emulation (KNE) lets you run virtual network topologies in Kubernetes. It does so by running various device operating systems in containers. For anyone interested in a more in-depth view into the inner workings of KNE the projects source code should act as the reference since the documentation is a bit lacking.
This document will guide you through setting up KNE on OpenShift (OCP) (or on OKD respectively) in the first step. Then you will use the KNE cluster to create and interact with a topology based on Arista cEOS. In the last step you will set up a minimal workflow using the same topology attached to virtual instances of the Ixia-C Open Traffic Generator to pre-validate hardware tests. This should give you a good understanding of what virtual testing is all about and allow you to adopt this concept to more complex CI workflows leveraging your choice of tools.
If you are struggling to setup an OKD cluster, you might want to follow this guide to learn it the hard way.
With a few changes all steps mentioned in this guide are also usable on other Kubernetes distributions.
This guide assumes that you are familiar with a few technologies on a fundamental level:
- OpenShift
- Operators
- Networking
The following system specifications are required for the cluster in order to follow the guide and run some additional workload:
- x86_64 cluster system architecture
- At least 3 worker nodes
- At least 8 CPU cores per worker node
- At least 16 GB of RAM per worker node
- At least 1 GBit/s network interfaces on all nodes
- Cluster administrator permissions
- Internet access
If the cluster worker nodes are hosted on a hypervisor, make sure to pass-trough the CPU information as features such as SSSE3 are required to run traffic generation successfully. A smaller cluster might be feasible as well but it could significantly impact performance. For the development of this guide an OpenShift 4.11 cluster has been used.
To get started clone the source code required for this guide:
git clone https://github.com/raballew/kne-on-ocp.git
cd kne-on-ocp
meshnet
is a Kubernetes CNI plugin that allows you to create arbitrary virtual
network topologies. It interconnects pods via direct point-to-point links
according to pre-defined topologies.
Deploy meshnet
:
oc apply -k manifests/meshnet/overlays/openshift
Wait until meshnet
is ready:
oc rollout status daemonset meshnet -n meshnet
Test if meshnet
works:
oc create namespace meshnet-test
oc apply -f manifests/meshnet-test.yaml -n meshnet-test
oc exec r1 -n meshnet-test -- ping -c 1 12.12.12.2
oc delete namespace meshnet-test
meshnet
is successfully configured, if you have been able to ping 12.12.12.2
from pod r1
.
This step is optional if you have already set up a load balancer so that services of type
LoadBalancer
with external IP addresses can be used or if no cluster external access to the virtual environment is required.
Follow the instructions in the official MetalLB documentation and its notes for OCP and OKD.
Arista requires its users to register at arista.com before downloading any container images.
Make sure to register with the user role Partner or Customer (do not use Guest role) because otherwise you might not be able to download the required artifacts. If you are already registered you can change the role in your profile settings.
Once you created an account and logged in, go to the software downloads section and download a 64-bit release of cEOS-lab.
This guide has been tested with
cEOS-lab/EOS-4.28.3M/cEOS64-lab-4.28.3M.tar.xz
. When you download this file the.xz
suffix will be missing.
Expose the internal registry and push the container image into it. Finally, set the internal registry back to private.
oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge
HOST=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')
podman import cEOS64-lab-4.28.3M.tar ceos64:4.28.3M
podman tag localhost/ceos64:4.28.3M $HOST/openshift/ceos64:4.28.3M
podman login -u kubeadmin -p $(oc whoami -t) $HOST
podman push $HOST/openshift/ceos64:4.28.3M
oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":false}}' --type=merge
Follow the instructions in the official KNE documentation on how to setup KNE.
Some vendors provide a controller that handles the pod lifecycle for their nodes. Arista provides a controller for cEOS nodes.
Deploy the Arista controller:
oc apply -f https://raw.githubusercontent.com/aristanetworks/arista-ceoslab-operator/v2.0.1/config/kustomized/manifest.yaml
Deploy a basic topology with three cEOS virtual instances (r1
, r2
, r3
)
into a new namespace.
┌────┐ ┌────┐ ┌────┐
│eth3│ │eth4│ │eth5│
┌─┴────┴──┐ ┌─┴────┴──┐ ┌─┴────┴──┐
│ r1 │ │ r2 │ │ r3 │
├────┬────┤ ├────┬────┤ ├────┬────┤
│eth1│eth2│ │eth1│eth2│ │eth1│eth2│
└┬───┴──┬─┘ └─┬──┴───┬┘ └─┬──┴───┬┘
│ │ │ │ │ │
│ 1.2.0.1/30 1.2.0.2/30 1.2.0.9/30 1.2.0.10/30 │
│ │ │ │ │ │
│ └─────────────────┘ └────────────────┘ │
1.2.0.5/30 1.2.0.6/30
│ │
└───────────────────────────────────────────────────────┘
r1: AS1, 2.2.2.1/32
r2: AS1, 2.2.2.2/32
r3: AS1, 2.2.2.3/32
Make sure the correct
image
is set forARISTA
nodes in 3-node-ceos.pb.txt if you use a different version of cEOS.
namespace=3-node-ceos
oc create namespace $namespace
# Fixes https://github.com/open-traffic-generator/ixia-c-operator/issues/18
# Fixes https://github.com/aristanetworks/arista-ceoslab-operator/issues/5
oc apply -f manifests/rbac/privileged-patch.yaml -n $namespace
tmp_dir=$(mktemp -d)
cp -r topologies/ $tmp_dir
echo "name: \"$namespace\"" >> $tmp_dir/topologies/3-node-ceos.pb.txt
kne create $tmp_dir/topologies/3-node-ceos.pb.txt --kubecfg $KUBECONFIG
Where:
$KUBECONFIG
- List of paths to configuration files used to configure access to a cluster
Do not interrupt the
kne
command. It can take minutes until it returns. Just be patient and wait.
Wait a few minuted and then verify that the virtual instances are working properly:
oc exec -it -n $namespace r1 -- Cli -c "show bgp statistics"
oc exec -it -n $namespace r2 -- Cli -c "show bgp statistics"
oc exec -it -n $namespace r3 -- Cli -c "show bgp statistics"
If you check the output you will realize that BGP does not seem to work properly, as each virtual instances shows that
1 neighbor is in Idle(NoIf) state
. This is due to the fact, that in this topology the traffic generator is not attached yet and can be ignored for the moment.
You can also access each virtual instance by using their external or cluster IP
address. Additionally you could use the Kubernetes DNS service to address each
service via its DNS A record (<svc>.<namespace>.svc.cluster.local
).
oc get services -n $namespace
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service-r1 LoadBalancer 172.30.203.182 <REDACTED> 443:31420/TCP,22:32358/TCP,6030:31958/TCP 29s
service-r2 LoadBalancer 172.30.23.251 <REDACTED> 22:32103/TCP,6030:32362/TCP,443:32467/TCP 28s
service-r3 LoadBalancer 172.30.61.134 <REDACTED> 443:31661/TCP,22:32477/TCP,6030:32122/TCP 28s
Delete the topology:
oc delete namespace $namespace
The OTG project offers an operator that through a CRD allows the deployment of a modern, powerful and API-driven open source traffic generator with limited functionality. A commercially supported version is available through the Keysight Elastic Network Generator which offers advanced functionality such as the emulation of key data center OSI layer 2 and OSI layer 3 control plane protocols. While simple traffic generation works with the open source version as well, the topologies deployed in this guide rely on BGP, a control plane protocol. Hence, in order to run a meaningful test the traffic generator instances need to be configured using the protocol engine - a functionality that is only available in the commercially supported version.
Reach out to the Keysight Support in order to gain access to the container images for the commercially supported version.
Install the Ixia-C traffic generator:
oc apply -f https://github.com/open-traffic-generator/ixia-c-operator/releases/download/v0.2.6/ixiatg-operator.yaml
You need to decide if you want to use publicly available container images for the open-source version or container images hosted on a private registry for the commercially supported version.
If you want to use the open-source version apply the following configuration:
oc apply -f manifests/ixiatg/config-open-source.yaml
If you want to use the commercially supported version apply the following configuration and update the global cluster pull secret by appending a new pull secret for Keysights private container registry:
oc apply -f manifests/ixiatg/config-licensed.yaml
For the deployment of the Ixia-C traffic generator with custom images the version for nodes of type
IXIA_TG
specified in 3-node-ceos-with-traffic.pb.txt needs to match the release value at.spec.data.versions
in config-open-source.yaml or config-licensed.yaml. If you want to use a different version, make sure to adjust all files accordingly before applying them to the cluster. The latest upstream version of this configuration is published on the ixia-c-operator releases page.
3-node-ceos-with-traffic.pb.txt uses the same topology with a broken BGP configuration as 3-node-ceos.pb.txt but added services for traffic generation.
┌───────┐
│ │
│ │
┌┴───┬───┴┐
│eth4│eth5│
┌──┴────┴────┴─┐
│ otg │
├────┬────┬────┤
┌──────────────┤eth1│eth2│eth3├──────────────┐
│ └────┴─┬──┴────┘ │
│ │ │
10.10.10.1/24 20.20.20.1/24 30.30.30.1/24
│ │ │
│ │ │
10.10.10.2/24 20.20.20.2/24 30.30.30.2/24
│ │ │
┌───┴┐ ┌─┴──┐ ┌┴───┐
│eth3│ │eth4│ │eth5│
┌─┴────┴──┐ ┌─┴────┴──┐ ┌─┴────┴──┐
│ r1 │ │ r2 │ │ r3 │
├────┬────┤ ├────┬────┤ ├────┬────┤
│eth1│eth2│ │eth1│eth2│ │eth1│eth2│
└┬───┴──┬─┘ └─┬──┴───┬┘ └─┬──┴───┬┘
│ │ │ │ │ │
│ 1.2.0.1/30 1.2.0.2/30 1.2.0.9/30 1.2.0.10/30 │
│ │ │ │ │ │
│ └─────────────────┘ └────────────────┘ │
1.2.0.5/30 1.2.0.6/30
│ │
└───────────────────────────────────────────────────────┘
r1: AS1, 2.2.2.1/32
r2: AS1, 2.2.2.2/32
r3: AS1, 2.2.2.3/32
otg/eth1: AS1111, 2.2.2.4/32
otg/eth2: AS2222, 2.2.2.5/32
otg/eth3: AS3333, 2.2.2.6/32
Deploy this topology into a new namespace:
Make sure the correct
image
is set forARISTA
nodes in 3-node-ceos-with-traffic.pb.txt if you use a different version of cEOS.
namespace=3-node-ceos-with-traffic
oc create namespace $namespace
# Fixes https://github.com/open-traffic-generator/ixia-c-operator/issues/18
# Fixes https://github.com/aristanetworks/arista-ceoslab-operator/issues/5
oc apply -f manifests/rbac/privileged-patch.yaml -n $namespace
tmp_dir=$(mktemp -d)
cp -r topologies/ $tmp_dir
echo "name: \"$namespace\"" >> $tmp_dir/topologies/3-node-ceos-with-traffic.pb.txt
kne create $tmp_dir/topologies/3-node-ceos-with-traffic.pb.txt --kubecfg $KUBECONFIG
Where:
$KUBECONFIG
- List of paths to configuration files used to configure access to a cluster
Do not interrupt the
kne
command. It can take minutes until it returns. Just be patient and wait.
Once the command returns successfully, a new topology consisting of several switches and a reference implementation of the Open Traffic Generator API has been deployed with the Ixia-C traffic generator.
oc get services -n $namespace
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service-gnmi-otg-controller LoadBalancer 172.30.36.53 <REDACTED> 50051:31039/TCP 43s
service-grpc-otg-controller LoadBalancer 172.30.148.74 <REDACTED> 40051:31636/TCP 43s
service-https-otg-controller LoadBalancer 172.30.226.247 <REDACTED> 443:30120/TCP 43s
service-otg-port-eth1 LoadBalancer 172.30.17.198 <REDACTED> 5555:30754/TCP,50071:30271/TCP 43s
service-otg-port-eth2 LoadBalancer 172.30.94.133 <REDACTED> 5555:31173/TCP,50071:31540/TCP 43s
service-otg-port-eth3 LoadBalancer 172.30.94.236 <REDACTED> 5555:30410/TCP,50071:31177/TCP 43s
service-otg-port-eth4 LoadBalancer 172.30.88.101 <REDACTED> 5555:32371/TCP,50071:31531/TCP 43s
service-otg-port-eth5 LoadBalancer 172.30.209.82 <REDACTED> 5555:31945/TCP,50071:31893/TCP 43s
service-r1 LoadBalancer 172.30.99.70 <REDACTED> 443:30896/TCP,22:30113/TCP,6030:30300/TCP 43s
service-r2 LoadBalancer 172.30.60.68 <REDACTED> 443:31079/TCP,22:30859/TCP,6030:30132/TCP 43s
service-r3 LoadBalancer 172.30.236.73 <REDACTED> 6030:32008/TCP,443:30317/TCP,22:32145/TCP 43s
In order to generate traffic scripts using
snappi can be executed
against the newly deployed services (*-otg-controller
and *-otg-port-eth*
services as shown above). Another, more simplistic way of running tests is by
using the Open Traffic Generator CLI
tool which will be used in
this guide.
In order to validate the functionality of the traffic generator deploy a job
that triggers a back-to-back test using a direct link between two ports (eth4
and eth5
) on the traffic generator:
oc create -f flows/job-flow-otg-otg.yaml -n $namespace
By inspecting the logs, for each flow eth4>eth5
and eth5>eth4
, the number of
frames received is equal to the number of frames sent. This indicates that this
particular connection works fine for the test parameters and the traffic
generator is operational.
oc get job -l flow=otg-otg -o name -n $namespace | xargs oc logs -n $namespace -f
+-----------+-----------+-----------+
| NAME | FRAMES TX | FRAMES RX |
+-----------+-----------+-----------+
| eth4>eth5 | 1000 | 1000 |
| eth5>eth4 | 1000 | 1000 |
+-----------+-----------+-----------+
The following steps require access to the commercially supported version of Ixia-C traffic generator. Using the open-source version does not work as it does not provide a protocol-engine container image. Reach out to the Keysight Support in order to gain access to their private container registries.
Lets try something more complex and see if our initial assumption, that the
topology of the virtual switch instances is broken, remains true. This can be
done by running a flow that tries to send traffic from ports eth1
(connected
to virtual instance r1
), eth2
(connected to virtual instance r2
) and
eth3
(connected to virtual instance r3
) to all other ports. In theory, if
everything is configured properly, this should create a similar output as in the
previous flow but as a cautious reader you probably already know that something
is broken.
oc create -f flows/job-flow-r1-r2-r3.yaml -n $namespace
When inspecting the logs they confirm our assumption because no frames are
received on eth1
at all, indicating that the connection with r2
(eth2
) and
r3
(eth3
) is misconfigured.
oc get job -l flow=r1-r2-r3 -o name -n $namespace | xargs oc logs -n $namespace -f
+-----------+-----------+-----------+
| NAME | FRAMES RX | FRAMES TX |
+-----------+-----------+-----------+
| eth3>eth2 | 1000 | 1000 |
| eth1>eth2 | 1000 | 1000 |
| eth1>eth3 | 1000 | 1000 |
| eth2>eth1 | 0 | 1000 |
| eth2>eth3 | 1000 | 1000 |
| eth3>eth1 | 0 | 1000 |
+-----------+-----------+-----------+
By inspecting flow-r1-r2-r3.yaml you will see that
the traffic generator advertises a route for 198.51.100.0/24
but as shown in
r1-config the switch r1
drops all inbound traffic
for the neighbor 10.10.10.1
as part of the PREFIX
route map.
Lets fix this by first removing the broken topology:
oc delete namespace $namespace
Inspect both configurations for r1
and
r2
Then create a topology with a valid configuration r1-config-fixed:
namespace=3-node-ceos-with-traffic-fixed
oc create namespace $namespace
# Fixes https://github.com/open-traffic-generator/ixia-c-operator/issues/18
# Fixes https://github.com/aristanetworks/arista-ceoslab-operator/issues/5
oc apply -f manifests/rbac/privileged-patch.yaml -n $namespace
tmp_dir=$(mktemp -d)
cp -r topologies/ $tmp_dir
echo "name: \"$namespace\"" >> $tmp_dir/topologies/3-node-ceos-with-traffic-fixed.pb.txt
kne create $tmp_dir/topologies/3-node-ceos-with-traffic-fixed.pb.txt --kubecfg $KUBECONFIG
Where:
$KUBECONFIG
- List of paths to configuration files used to configure access to a cluster
Do not interrupt the
kne
command. It can take minutes until it returns. Just be patient and wait.
This time valid test results should be shown where the number of transmitted frames is equal to the number of received frames:
oc create -f flows/job-flow-r1-r2-r3.yaml -n $namespace
oc get job -l flow=r1-r2-r3 -o name -n $namespace | xargs oc logs -n $namespace -f
+-----------+-----------+-----------+
| NAME | FRAMES TX | FRAMES RX |
+-----------+-----------+-----------+
| eth2>eth3 | 1000 | 1000 |
| eth3>eth1 | 1000 | 1000 |
| eth3>eth2 | 1000 | 1000 |
| eth1>eth2 | 1000 | 1000 |
| eth1>eth3 | 1000 | 1000 |
| eth2>eth1 | 1000 | 1000 |
+-----------+-----------+-----------+
If you are finished with testing delete the topology:
oc delete namespace $namespace
KNE itself is still under development and lacks some convenience features such as deploying a topology into a specific namespace or taking into account the current context set in the kubeconfig files. Additionally KNE deploys pods only, which means the entire process described is vulnerable against disruptions such as node failures even though there might be good reasons to not restart by using more sophisticated approaches such as deployments since a failed instance could indicate that a critical error caused the virtual instance to crash and in a testing environment recovering automatically might be a bad idea.
The operators required to use different node types supplied by vendors such as Arista or Keysight do not seem to work on OpenShift out of the box as resources managed trough them still need additional privileges. Due to this limitation a RBAC patch has to be applied between namespace and topology creation. Keep in mind, that this was done to proof the point that it is possible to run KNE on OCP. Before considering going into production reaching out to the vendors in order to build a supported solution is highly recommended.