This repo contains scripts for setting up a Kubernetes cluster for Testground.
Using the cluster:k8s
runner of Testground enables you to test distributed/p2p systems at scale.
The cluster:k8s
Testground runner is capable of launching test workloads comprising 10k+ instances, and we aim to reach 100k at some point.
The IPFS and libp2p projects have used these scripts and playbooks to deploy large-scale test infrastructure. By crafting test scenarios that exercise components at such scale, we have been able to run simulations, carry out attacks, perform benchmarks, and execute all kinds of tests to validate correctness and performance.
Kubernetes Operations (kops
) is a tool which helps to create, destroy, upgrade and maintain production-grade Kubernetes clusters from the command line. We use it to create a Kubernetes cluster on AWS.
We use CoreOS Flannel for networking on Kubernetes - for the default Kubernetes network, which in Testground terms is called the control
network.
We use Weave for the data
plane on Testground - a secondary overlay network that we attach containers to on-demand.
kops
uses 100.96.0.0/11 for pod CIDR range, so this is what we use for the control
network.
We configure Weave to use 16.0.0.0/4 as CIDR (we want to test libp2p
nodes with IPs in public ranges), so this is the CIDR for the Testground data
network. The sidecar
is responsible for setting up the data
network for every testplan instance.
In order to have two different networks attached to pods in Kubernetes, we run the CNI-Genie CNI.
More information on the Testground Networking requirements can be found here.
-
Generate your AWS IAM credentials.
- Configure the aws-cli tool with your credentials.
- Create a
.env.toml
file (copying over theenv-example.toml
at the root of this repo as a template), and add your region to the[aws]
section.
-
For the Testground team: Download shared key for
kops
. The Testground team uses a shared key, so that everyone on the team can log into any ephemeral cluster and have full access.
$ aws s3 cp s3://kops-shared-key-bucket/testground_rsa ~/.ssh/
$ aws s3 cp s3://kops-shared-key-bucket/testground_rsa.pub ~/.ssh/
$ chmod 700 ~/.ssh/testground_rsa
Or generate your own key, for example
$ ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
- Create a bucket for
kops
state. This is similar to Terraform state bucket.
$ aws s3api create-bucket \
--bucket <bucket_name> \
--region <region> --create-bucket-configuration LocationConstraint=<region>
Where:
<bucket_name>
is a unique AWS account-wide unique bucket name to store this cluster's kops state, e.g.kops-backend-bucket-<your_username>
.<region>
is an AWS region likeeu-central-1
orus-west-2
.
- Pick:
- a cluster name,
- set AWS region
- set AWS availability zone A (not region; for example
us-west-2a
[availability zone]) - used for master node and worker nodes - set AWS availability zone B (not region; for example
us-west-2b
[availability zone]) - used for more worker nodes - set
kops
state store bucket - set number of worker nodes
- set location for cluster spec to be generated
- set location of your cluster SSH public key
- set credentials and locations for
outputs
S3 bucket
You might want to add them to your rc
file (.zshrc
, .bashrc
, etc.), or to an .env.sh
file that you source.
In addition to the initial cluster setup, these variables should be accessible to the daemon. If these variables are manually set or you source them manually, you should make sure to do so before starting the Testground daemon.
# `NAME` needs to be a subdomain of an existing Route53 domain name.
# The Testground team uses `.testground.ipfs.team`, which is already set up for our Testground AWS account.
# Alternatively you could use `name.k8s.local` and use Gossip DNS.
export NAME=<desired kubernetes cluster name (cluster name must be a fully-qualified DNS name (e.g. mycluster.k8s.local or mycluster.testground.ipfs.team)>
export KOPS_STATE_STORE=s3://<kops state s3 bucket>
export AWS_REGION=<aws region, for example eu-central-1>
export ZONE_A=<aws availability zone, for example eu-central-1a>
export ZONE_B=<aws availability zone, for example eu-central-1b>
export WORKER_NODES=4
export PUBKEY=$HOME/.ssh/testground_rsa.pub
export TEAM=<optional - your team name ; used for cost allocation purposes>
export PROJECT=<optional - your project name ; used for cost allocation purposes>
- Set up Helm and add the
stable
Helm Charts repository
If you haven't, install helm now.
$ helm repo add stable https://kubernetes-charts.storage.googleapis.com/
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm repo update
This will take about 10-15 minutes to complete.
Once you run this command, take some time to walk the dog, clean up around the office, or go get yourself some coffee! When you return, your shiny new kubernetes cluster will be ready to run testground plans.
To create a monitored cluster in the region specified in $ZONE
with
$WORKER_NODES
number of workers:
$ ./k8s/install.sh ./k8s/cluster.yaml
Do not forget to delete the cluster once you are done running test plans.
$ ./k8s/delete.sh
- Edit the cluster state and change number of nodes.
$ kops edit ig nodes
- Apply the new configuration
$ kops update cluster $NAME --yes
- Wait for nodes to come up and for DaemonSets to be Running on all new nodes
$ watch 'kubectl get pods'
- Access to Grafana (initial credentials are
username: admin
;password: testground
):
$ kubectl port-forward service/testground-infra-grafana 3000:80
Testground is still in very early stage of development. It is possible that it crashes, or doesn't properly clean-up after a testplan run. Here are a few commands that could be helpful for you to inspect the state of your Kubernetes cluster and clean up after Testground.
- Delete all pods that have the
testground.plan=dht
label (in case you used the--run-cfg keep_service=true
setting on Testground.
$ kubectl delete pods -l testground.plan=dht --grace-period=0 --force
- Restart the
sidecar
daemon which manages networks for all testplans
$ kubectl delete pods -l name=testground-sidecar --grace-period=0 --force
- Review all running pods
$ kubectl get pods -o wide
- Get logs from a given pod
$ kubectl logs <pod-id, e.g. tg-dht-c95b5>
- Check on the monitoring infrastructure (it runs in the monitoring namespace)
$ kubectl get pods --namespace monitoring
- Get access to the Redis shell
$ kubectl port-forward svc/testground-infra-redis-master 6379:6379 &
$ redis-cli -h localhost -p 6379
kops
lets you download the entire Kubernetes context config.
If you want to let other people on your team connect to your Kubernetes cluster, you need to give them the information.
$ kops export kubecfg --state $KOPS_STATE_STORE --name=$NAME
Additional information about this runner and more can be found on the Testground gitbook
Our work is never finished. If you see anything we can do better, file an issue on github.com/testground/testground repo or open a PR!
Dual-licensed: MIT, Apache Software License v2, by way of the Permissive License Stack.