Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
crstin committed Aug 17, 2020
0 parents commit 42fcf9e
Show file tree
Hide file tree
Showing 33 changed files with 1,023 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .ansible-lint
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
skip_list:
- 403 # Package installs should not use latest
exclude_paths:
- ./kubespray/
- ./roles/matthiaslohr.hvswitch_k8s/
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/inventory/credentials/
/roles/matthiaslohr.hvswitch_k8s/
/venv/
/inventory/inventory.ini
/private-*.yml
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "kubespray"]
path = kubespray
url = https://github.com/kubernetes-sigs/kubespray.git
1 change: 1 addition & 0 deletions .tool-versions
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
python 3.8.5
124 changes: 124 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Hetzner Bare Metal k8s Cluster

The scripts in this repository will setup and maintain one or more [kubernetes][k8s] clusters consisting of dedicated [Hetzner][hetzner] servers. Each cluster will also be provisioned to operate as a node in the [THORCHain][tc] network.

Executing the scripts in combination with some manual procedures will get you clusters with the following features on bare metal.

* [Kubespray][kubespray] (based)
* Internal NVMe storage ([Ceph][ceph]/[Rook][rook])
* Virtual LAN (also over multiple locations) ([Calico][calico])
* Load Balancing ([MetalLB][metallb])

## Preparations

### Servers

Acquire a couple of [servers][buy] as the basis for a cluster (`AX41-NVME`'s are working well for instance). Visit the [admin panel][admin] and name the servers appropriately.

```text
tc-k8s-master
tc-k8s-worker1
tc-k8s-worker2
...
```

Refer to the [reset procedure][reset] to properly initialize them.

### vSwitch

Create a [vSwitch][vswitch] and order an appropriate subnet (it may take a while to show up after the order). Give the vSwitch a name (i.e. `tc-k8s-net`) and assign this vSwitch to the servers.

Checkout the [docs][vswitch_docs] for help.

## Usage

Clone this repository, `cd` into it and download kubespray.

```bash
git submodule init && git submodule update
```

Create a Python virtual environment or similar.

```bash
# Optional
virtualenv -p python3 venv
```

Install dependencies required by Python and Ansible Glaxy.

```bash
pip install -r requirements.python.txt
ansible-galaxy install -r requirements.ansible.yml
```

> Note: Mitogen does not work with ansible collections and needs to be disabled.
### Provisioning

```bash
cp hosts.example inventory/inventory.ini
cp cluster.yml.example private-cluster.yml
```

Add your server ip's to `inventory.ini` and your network information into `private-cluster.yml`

If you want to manage multiple clusters simply name the files according to the pattern below.

```text
private-cluster-01.yml
private-cluster-02.yml
private-cluster-02.yml
...
private-test.yml
...
private-helsinki-01.yml
...
private-whatever.yml
```

```bash
# Manage a cluster
ansible-playbook private-cluster.yml

# If you want to run kubespray separately
ansible-playbook kubespray/cluster.yml
```

> Check [this][kubespray] out for more playbooks on cluster management.
### THORChain

In order for the cluster to operate as a node in the THORCHain network deploy as instructed [here][tc_deplyoing]. You can also refer to the [node-launcher repository][node-launcher], if necessary, or the THORChain [documentation][tc_docs] as a whole.

## Resetting the bare metal servers

Visit the [console][admin] and put each server of the cluster into rescue mode. Then execute the following script.

```bash
installimage -a -r no -i images/Ubuntu-1804-bionic-64-minimal.tar.gz -p /:ext4:all -d nvme0n1 -f yes -t yes -n hostname
```

This will install and use Ubuntu on only one of the two internal NVMe drives. The unused ones will be used for persistent storage with ceph/rook. You can check the internal drive setup with `lsblk`. Change it accordingly in the command shown above when necessary.

> Ubuntu 18.04 is used because kubespray does not support 20.04 (yet)
[reset]: #resetting-the-bare-metal-servers
[hetzner]: https://www.hetzner.com
[buy]: https://www.hetzner.com/dedicated-rootserver/matrix-ax
[admin]: https://robot.your-server.de/server
[vswitch]: https://robot.your-server.de/vswitch/index
[vswitch_docs]: https://docs.hetzner.com/robot/dedicated-server/network/vswitch
[k8s]: https://kubernetes.io
[kubespray]: https://kubespray.io/
[metallb]: https://metallb.universe.tf
[calico]: https://www.projectcalico.org
[ceph]: https://ceph.io
[rook]: https://rook.io
[tc]: https://thorchain.org
[tc_docs]: https://docs.thorchain.org
[tc_deplyoing]: https://docs.thorchain.org/thornodes/kubernetes/deploying
[node-launcher]: https://gitlab.com/thorchain/devops/node-launcher
4 changes: 4 additions & 0 deletions ansible.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[defaults]
roles_path = ./roles
inventory = ./inventory/inventory.ini
nocows = 1
32 changes: 32 additions & 0 deletions cluster.yml.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
- hosts: all
vars:
authorized_keys:
# - key1
# - key2
vswitches:
- name: tc-k8s-net # vSwitch name, used for naming the routing table.
routing_table: 1 # ID for the routing table.
vlan: 4000 # VLAN ID for the vSwitch. 4000-4091 supported by Hetzner.
gateway: 33.33.33.33 # If the vSwitch has a subnet, this variable should contain the subnet's gateway IP address
addresses: # IP addresses for the vSwitch network interface (per host)
- "{{ hostvars[inventory_hostname]['ip'] }}/24"
subnets: # Subnets available on the vSwitch (need to be registered with Hetzner robot) for non-private networks
- subnet: 33.33.33.32/29
roles:
- access
- packages
- matthiaslohr.hvswitch_k8s

- hosts: kube-master
roles:
- thorchain

- import_playbook: kubespray/cluster.yml

- hosts: kube-master[0]
vars:
address_range:
- 33.33.33.34-33.33.33.37
roles:
- storage
- load_balancing
27 changes: 27 additions & 0 deletions hosts.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[all:vars]
ansible_user=root
ansible_ssh_user=root
ansible_python_interpreter=/usr/bin/python3

[all]
master ansible_host=11.22.33.44 ip=10.10.10.11 etcd_member_name=master
worker1 ansible_host=22.33.44.55 ip=10.10.10.12
worker2 ansible_host=33.44.55.66 ip=10.10.10.13

[kube-master]
master

[etcd]
master

[kube-node]
master
worker1
worker2

[calico-rr]

[k8s-cluster:children]
kube-master
kube-node
calico-rr
98 changes: 98 additions & 0 deletions inventory/group_vars/all/all.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
## Directory where etcd data stored
etcd_data_dir: /var/lib/etcd

## Experimental kubeadm etcd deployment mode. Available only for new deployment
etcd_kubeadm_enabled: false

## Directory where the binaries will be installed
bin_dir: /usr/local/bin

## The access_ip variable is used to define how other nodes should access
## the node. This is used in flannel to allow other flannel nodes to see
## this node for example. The access_ip is really useful AWS and Google
## environments where the nodes are accessed remotely by the "public" ip,
## but don't know about that address themselves.
# access_ip: 1.1.1.1


## External LB example config
## apiserver_loadbalancer_domain_name: "elb.some.domain"
# loadbalancer_apiserver:
# address: 1.2.3.4
# port: 1234

## Internal loadbalancers for apiservers
# loadbalancer_apiserver_localhost: true
# valid options are "nginx" or "haproxy"
# loadbalancer_apiserver_type: nginx # valid values "nginx" or "haproxy"

## Local loadbalancer should use this port
## And must be set port 6443
loadbalancer_apiserver_port: 6443

## If loadbalancer_apiserver_healthcheck_port variable defined, enables proxy liveness check for nginx.
loadbalancer_apiserver_healthcheck_port: 8081

### OTHER OPTIONAL VARIABLES
## For some things, kubelet needs to load kernel modules. For example, dynamic kernel services are needed
## for mounting persistent volumes into containers. These may not be loaded by preinstall kubernetes
## processes. For example, ceph and rbd backed volumes. Set to true to allow kubelet to load kernel
## modules.
# kubelet_load_modules: false

## Upstream dns servers
# upstream_dns_servers:
# - 8.8.8.8
# - 8.8.4.4

## There are some changes specific to the cloud providers
## for instance we need to encapsulate packets with some network plugins
## If set the possible values are either 'gce', 'aws', 'azure', 'openstack', 'vsphere', 'oci', or 'external'
## When openstack is used make sure to source in the openstack credentials
## like you would do when using openstack-client before starting the playbook.
# cloud_provider:

## When cloud_provider is set to 'external', you can set the cloud controller to deploy
## Supported cloud controllers are: 'openstack' and 'vsphere'
## When openstack or vsphere are used make sure to source in the required fields
# external_cloud_provider:

## Set these proxy values in order to update package manager and docker daemon to use proxies
# http_proxy: ""
# https_proxy: ""

## Refer to roles/kubespray-defaults/defaults/main.yml before modifying no_proxy
# no_proxy: ""

## Some problems may occur when downloading files over https proxy due to ansible bug
## https://github.com/ansible/ansible/issues/32750. Set this variable to False to disable
## SSL validation of get_url module. Note that kubespray will still be performing checksum validation.
# download_validate_certs: False

## If you need exclude all cluster nodes from proxy and other resources, add other resources here.
# additional_no_proxy: ""

## Certificate Management
## This setting determines whether certs are generated via scripts.
## Chose 'none' if you provide your own certificates.
## Option is "script", "none"
## note: vault is removed
# cert_management: script

## Set to true to allow pre-checks to fail and continue deployment
# ignore_assert_errors: false

## The read-only port for the Kubelet to serve on with no authentication/authorization. Uncomment to enable.
# kube_read_only_port: 10255

## Set true to download and cache container
# download_container: true

## Deploy container engine
# Set false if you want to deploy container engine manually.
# deploy_container_engine: true

## Set Pypi repo and cert accordingly
# pyrepo_index: https://pypi.example.com/simple
# pyrepo_cert: /etc/ssl/certs/ca-certificates.crt
56 changes: 56 additions & 0 deletions inventory/group_vars/all/docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
## Uncomment this if you want to force overlay/overlay2 as docker storage driver
## Please note that overlay2 is only supported on newer kernels
# docker_storage_options: -s overlay2

## Enable docker_container_storage_setup, it will configure devicemapper driver on Centos7 or RedHat7.
docker_container_storage_setup: false

## It must be define a disk path for docker_container_storage_setup_devs.
## Otherwise docker-storage-setup will be executed incorrectly.
# docker_container_storage_setup_devs: /dev/vdb

## Uncomment this if you have more than 3 nameservers, then we'll only use the first 3.
docker_dns_servers_strict: false

# Path used to store Docker data
docker_daemon_graph: "/var/lib/docker"

## Used to set docker daemon iptables options to true
docker_iptables_enabled: "false"

# Docker log options
# Rotate container stderr/stdout logs at 50m and keep last 5
docker_log_opts: "--log-opt max-size=50m --log-opt max-file=5"

# define docker bin_dir
docker_bin_dir: "/usr/bin"

# keep docker packages after installation; speeds up repeated ansible provisioning runs when '1'
# kubespray deletes the docker package on each run, so caching the package makes sense
docker_rpm_keepcache: 0

## An obvious use case is allowing insecure-registry access to self hosted registries.
## Can be ipaddress and domain_name.
## example define 172.19.16.11 or mirror.registry.io
# docker_insecure_registries:
# - mirror.registry.io
# - 172.19.16.11

## Add other registry,example China registry mirror.
# docker_registry_mirrors:
# - https://registry.docker-cn.com
# - https://mirror.aliyuncs.com

## If non-empty will override default system MountFlags value.
## This option takes a mount propagation flag: shared, slave
## or private, which control whether mounts in the file system
## namespace set up for docker will receive or propagate mounts
## and unmounts. Leave empty for system default
# docker_mount_flags:

## A string of extra options to pass to the docker daemon.
## This string should be exactly as you wish it to appear.
# docker_options: ""

docker_package_version: 18.09
22 changes: 22 additions & 0 deletions inventory/group_vars/etcd.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
## Etcd auto compaction retention for mvcc key value store in hour
# etcd_compaction_retention: 0

## Set level of detail for etcd exported metrics, specify 'extensive' to include histogram metrics.
# etcd_metrics: basic

## Etcd is restricted by default to 512M on systems under 4GB RAM, 512MB is not enough for much more than testing.
## Set this if your etcd nodes have less than 4GB but you want more RAM for etcd. Set to 0 for unrestricted RAM.
# etcd_memory_limit: "512M"

## Etcd has a default of 2G for its space quota. If you put a value in etcd_memory_limit which is less than
## etcd_quota_backend_bytes, you may encounter out of memory terminations of the etcd cluster. Please check
## etcd documentation for more information.
# etcd_quota_backend_bytes: "2G"

### ETCD: disable peer client cert authentication.
# This affects ETCD_PEER_CLIENT_CERT_AUTH variable
# etcd_peer_client_auth: true

## Settings for etcd deployment type
etcd_deployment_type: docker
Loading

0 comments on commit 42fcf9e

Please sign in to comment.