Skip to content

c.11: this is an implementation of a full devops pipeline ecosystem w/ gitlab w/o Cloud provider, using third party tools & a linode VPS and ansible to install services as docker and lxc containers. A fully secure devops environment. Ansible is main tool for provisioning the VPS. See README for high level and notes.txt for implementation details

Notifications You must be signed in to change notification settings

dmastrop/c.11.DevOps_startup_from_scratch_nonCloud

Repository files navigation

# NOTE: for implementation specifics, commands, directories and script details see notes.txt on the local VSCode repo or the EC2 controller repo. The notes.txt are not publishied to the github repo.

# Tools and process used in building this infrastructure include the following:

Base server: VPS server (linode) running latest Archlinux

Ownership of a domain. Mine is on Google Cloud DNS and NS will be delegated to linode domains so that it can be used for this project (similar to Route 53)

Linode VPS will be configured with a storage volume as well.

For network details etc. see notesx.txt file on EC2 controller

## High level list of Appication tools used in this project:

Linode VPS


SSL and TLS tools: certbot/letsencrypt for LXC containers non-HTTPS (like mail) and Traefik/letsencrypt for the Traefik HTTPS termination for all the containers (LXC and docker) (Web Admin on nginx in the containers) 

Lots of tools installed on the VPS: docker engine and lxc to name a few.

Ansible is the deployment tool for all of the apps

Wireguard on iPhone and WIndows10 client. Mac is outdated and not supported with Wireguard client.
The VPN is necessary as an added layer of security.

Mariadb (mysql) on VPS. Note some of the containers have their own mysql db and others utilize the
mariadb on the host (note: ip tables has to be configured accordingly)

iRedMail (LXC container)

Zulip messaging (LXC container)   Zulip is similar to slack

Traefik (docker container) reverse proxy and SSL termination point for all HTTP/HTTPS traffic going into the LXC and docker containers. Note for iRedMail native mail protocols (IMAP and SMTP) certbot/letsencrypt is used to generate the certs; on Traefik letsencrypt is exclusively used without certbot.

pi-hole (docker container) as a DNS resolver on all Wireguard clients when the VPN tunnel is up. This filters out ads, etc.

Nextcloud (docker container) for doc sharing, and much more

checkmk (docker container) for monitoriing of the VPS and emailing status through an SMTP 

Borg and Borgmatic

Gitlab and Gitlab runner

Python backup-checker

Cronjobs are used for backups for all of the LXC and docker containers as well as the VPC mariadb. An addional 40GB volume for backups has been acquired from linode for this purpose. (/mnt/storage)



## git repository architecture

Use the local VSCode on Mac as local repo (this does not have security senstive files as it is a pull from github)

Github repo (this does not have security sensitive files)

EC2 controller. Running ansible-playbook and venv from here to provision the VPS. This has the full repository source code including sensitive files and folders.

Use EC2 controller only, to provision the VPS. Once changes are tested push to github and then pull the changes to local VSCode on mac for local record keeping.

## Ansible:

Primary tool for provisioning the apps onto the linode VPS will be ansible (ansible-playbook will have respective roles for the various apps and services added to the setup.yml in the root. See github repo)

I will be running the ansible client from an EC2 ubunut controller. Mac as well would be fine, but i prefer the controller as it has many other devops tools that may be required. It is fully AWS configured as well but we will not be using and AWS, GCP or Azure for this project


## Running virtual environment on the EC2 controller:

KEY NOTE: run all of the ansible from a virtual environment.   I will be using venv. Set up the python env in the terminal that will run the venv prior to creating and running the venv.  The venv will inherit the python version which is critical for the ansible to work with the latest Archlinux VPS on linode. 
The ansible matrix is located here:
https://docs.ansible.com/ansible/latest/reference_appendices/release_and_maintenance.html#ansible-core-support-matrix

For the latest Archlinux: it is running python 3.12.5. This is the ansible target
One possible ansible client/python version to run on the venv on the EC2 controller is ansibile 10.3.0/core 2.17.3 with python 3.12.5 running in the venv.

To install the proper ansible version and python version in the venv use the following process. I prefer this over using penv and virtualenv which was causing issues with python3 and python versiions in the virtualenv not being aligned.

## Setting up the venv:

on EC2 controller (ansible client):
mkdir tmp_ansible && cd $_
pyenv install -l
pyenv install 3.12.5
pyenv shell 3.12.5
pyenv local 3.12.5
python -V should show 3.12.5 in this terminal

python -m venv latestenv
source latestenv/bin/activate
which python should point to venv path
which pip should point to venv path
python -V should state 3.12.5
pip install ansible==10.3.0 OR just pip install ansible <<< it will get latest which is 10.3.0
pip freeze should show same as you have

This ansible version and python version runs very well with latest Archlinux linode VPS (python version is 3.12.5 as well)

There are other combinations that will work in the venv but the above is ideal since it is the latest ansible version GA.


## ansible playbook files:

The ansible playbook and the roles as indicated in setup.yml are included in the github repository.
There are some files that have been intentially gitignored because that are clear text passwords or keys.  Also ansible encrypted password files (ansible-vault was used) are also not included in this githup repo. The complete files and folders are on the EC2 contoller.

## Ansible essentials:
this includes provioning the base linux packages on the Archlinux VPS

installing cronie for the cron jobs that will be used throughout

Configuring the network bride interface on the VPS that will be used for the LXC linux container communication with the host VPS

Configuring the loopback interace on the VPS that will be used for docker container to host VPS network communication and some other things 

## Basic VPS security (iptables firewall):

iptables are used to lock down the VPS to SSH and some other stuff as required as the applications and services and servers are added.  SSH is running on a proprietary port number to further mitigate attacks.  9473 is addded for the wireguard VPN tunnel (see below) which is required to run secure client to server communication to native services and servers like mariadb that do not support native encryption. Wireguard will use private ip addresses to communcate with the WireGuard server on the VPS inside the tunnel. These private ip addresses can then be used on the iptables and thus are allowed through to the VPS server. This is important.
Note that the nat rules in iptables will be added dynamically for docker containers that are brought up on the VPS in accordance with the private address space on the VPS (for docker outbound connections, i.e. MASQUERADE and docker inbound connections based upon the docker assigned ip address)

Kernel params on the Archlinux require tweaking for the ip forwarding for docker containers to use the aforementioned loopback address for docker container to host network communicatiion.

## NOTE on LXC linux containers vs. docker containers usage:

The docker containers will be used for mostly HTTP/HTTPS (nginx) stuff like hosting the gitlab whereas the LXC linux containers will be used for non-HTTP services like iRedMail and Zulip (collaboration tool similar to Slack for notifcations, for example of pipeline results)
LXC linux conatiners will also run HTTP as well (nginx) if the application reqiures full fledged linux file system for installation.





## Wireguard security and Traefik:

Wireguard plays a critical role in the security as noted above.  Wireguard server on the VPS will be run in conjunction with Traefik for traffic steering to the backend docker containers (and HTTP LXC containers running HTTP/HTTPS services.  Encryption  (HTTPS) will run to the Traefik middleware and this middlewhere will whitelist the private ip address wireguard clients (Windows and iPhone for me; some using Mac) and then run the native HTTP to the docker containers or LXC containers. There is no need for backend encryption at this time.
Traefik is an HTTPS reverse proxy with HTTP on the backend to the containers. It is a TLS termination point. The containers that it will steer traffic to (docker and LXC linux containers) will be running nginx.
Traefik has an ACME client built into it so the Let's Encrypt TLS certificates can be generated as needed.

## Wireguard VPN packet flow design, example:

VPS Wireguard server tunnel endpoint will have a private ip address
Wireguard clients will have private ip addresses on this same network
Public VPS linode server adddress is the public Wireguard VPN endpoint
The Public client tunnel endpoint will be the Comcast router public IP address of my home network
iptables on the VPS needs to have explicit rules added for the private ip addresses of the clients as source allow all. The tunnel is shaved off and then iptables kicks in, so iptables will see the private client ip addresses and these need to be allowed accordingly.

Example traffic flow from Wireguard client to gitlab docker container service:

Example Gitlab will be running in a native docker container (not LXC that is for linux containers)
Traefik will be the endpoint of the TLS to the backend docker containers (HTTP and HTTPS)
Traefik will have a whitelist middleware to allow only certain ip addresses to access gitlab web site.  For this example 10.100.94.11 and 10.100.94.12  VPN clients
DNS will be configured for gitlab.linode.cloudnetworktesting.com
This will be the public ip address of the VPS. In my case 173.230.155.238
On the PC will set up a wireguard rule that for gitlab.linode.cloudnetworktesting.com will use the VPN tunnel.

The process flow is gitlab.linode.*****cloudnetworktesitng*****.com (my dowmain) in the PC browser
It resolves to the ip of the VPS 
The PC network configuration with rule for VPN will use the wireguard tunnel for this gitlab.linode.*********.com
Once tunneled and then deencapsulated by the Wireguard server on the VPS (linode) the packet will be forwarded to the HTTPS reverse proxy, traefik, to terminate the HTTPS connection and send the native HTTP to the backend Gitlab that is running on the docker container on the VPS (docker containers will use the loopback interface, not the bridge. The bridge is for native linux containers running on LXC)

Traefik is used for all HTTP/HTTPS communication. Non-HTTP will use certbot certificate wildcard.

Traeik sees packet sourced from private IP client wireguard address. It will check the ip whitelist middleware for that domain, sees the source ip is present and lets the traffic through to the docker container running gitlab.


## certbot, ACME protocol, wildcard TLS certs:

Certbot will be used for non-HTTP services like iRedMail and Zulip.  Certbot can be used to generate wildcard certs for my domain and this makes secure communication with these types of services and servers more practical. Traefik will not be used to terminate the TLS for these servcies and servers.  

Certbot is an ACME client and Let' Encrypt will challenge it with a long string and certbot will be able to respond by creating the DNS entry for the domain because it will be given access to do.
To update Linode DNS via certbot we will need Linode’s personal access token so certbot has access to Linode DNS settings and configuration to make the change and answer the challenge from Let's Encrypt.


## Mariadb:

Mariadb is installed now and running on the linode VPS server.  Running mariadb from wireguard clients is the way to do this. Do not run mariadb from non-Wireguard client like EC2 controller because the mariadb server on the VPS does not support native TLS by default.  For security I am using ansible vault for the mariadb root password and database user password so that ansible can be used to configure this server (automated)

Mariadb will be used by backup-checker to do the backups on a cronjob.

For details on commands, networking, etc, see notesx.txt




## LXC and Linux containers:

As stated above some servers and applications like iRedMail and Zulip will need to be installed and run on LXC linux containers rather than docker containers. Although there are solutions now with docker containers, the native full file system linux available in LXC linux containers will make installation of these servers and running these servers more practical.
We can use certbot TLS certs to securitize the communication with remote hosts.

The ansible LXC role will actually install the lxc application on the VPS so that LXC containers can be created for iRedMail and Zulip
The ansible role is called lxc in the setup.yml

Creating and Provisioning the LXC containers will be done manually. Once the LXC container is created there is some additional configuration for assigning the ip address to the container from the bridge network as well as configuring /etc/resolv.conf (with 1.1.1.1 cloudfare.com as the nameserver) so that DNS resolution can be done from the mail.linode.cloudnetworktesting.com container. DNS is required in order for iRedMail application to work properly.

For network details see notesx.txt on the EC2 controller.  The LXC containers will use the bridge interface br0 for the subnet on which to create the LXC containers. The iRedMail will use 10.100.100.11/24 and the Zulip will use 10.100.100.12/24

notesx.txt has all the details on the full process, directory paths for configuration on the VPS and the containers themselves, etc.....



## This completes the basic infrastructure setup


## For SSH network details and configuration and files see notesx.txt

Use the ProxyJump or the ProxyCommand in the ~/.ssh/config file on the EC2 controller and add the public EC2 VPS cert to the mail.linode.cloudnetworktesting.com container in the ~/.ssh/authorized_keys.  This will make ssh directly from the EC2 controller into the iRedMail LXC container possible.

## Other servers, apps and services required to get the DevOps infra up and running (these will be used by end users):

These include: (as mentioned earlier)

iRedMail
Zulip collaboration tool
Traefik as mentioned above
Pi-Hole for DNS
Nextcloud
checkmk
Borg and Borgmatic
Gitlab and Gitlab runner
Python backup-checker

Each of these are cited below in terms of the infrastructre design and pipeline architecture.


## iRedMail:

The installation of this is complex. There is an LXC mariadb that is not associated with the linode VPS. There is also a nginx service to serve the webmail application. The mail can be accessed through the web via either Roundcube or Web admin panel (iRedAdmin). THe URLS are in notesx.txt file.

The bundle also includes iRedAPD - Postfix Policy Server

It uses Dovecot an open-source email server that acts as a mail storage server, mail proxy server, or lightweight mail user agent (MUA)

The notesx.txt file has the complete installation process including how to configure the LXC container prior to installing iRedMail and configuring DNS for the mail server as well as a TLS cert for the service via certbot.

### Additonial details about the implementation:
Use a certbot certificate as this is for non-HTTP traffic. The HTTP/HTTPS traffic will use Traefik and those certs will be installed on there.
For this generate a new certbot cert for mail.linode.cloudnetworktesting.com iRedMail domain

Once the certs are created and in the letsencrypt/archive and live directories on the VPS they then have to be tranferred onto the iRedMai LXC container

An lxc-mount must be added to the lxc config file on the VPS for both live and archive directories

Reboot the LXC container and verify that the certs are on the container
See notes.txt for specific directory details, etc.

Next edit the Postfix and Dovecot services on the LXC container to use these certificates

Restart both services so that the certs are loaded.

### User configuration and iRedMail Web Admin console

At this point we should be able to hit the Web Admin site for iRedMail through the browser on TLS/HTTPS
It fails from non Wireguard VPN because VPS does not have a webserver (nginx) yet to forward the traffic to the LXC container

However putting the LXC container address in the browser to the iRedMail LXC container at LXC private container address should work
But the Wireguard VPN has to be up on the Windows client

First add the LXC container entire private subnet to the Wireguard allowed IP list on the Windows client
This works because the iRedMail container is already running the nginx web server on it

The self signed cert is fine to accept
Must use firefox. Micrsoft edge and chrome will not accept the self signed certbot cert

Log into the iRedMail Web Admin console using the Web Admin url and the private ip address of the LXC container. The user is the postmaster user.


Add users to the email domain linode.cloudnetworktesting.com
This will include several users that will be used for notifications, personal email, reports.

NOTE that the MX record will redirect linode.cloudnetworktesting.com to mail.linode.cloudnetworktesting.com


### iptables

iptables nat rules need to be added for 139, 587 and 25
The NAT rules are required to redirect the traffic coming into the VPS to the iRedMail LXC container.

Run the ansible-playbook security role once the iptables have been added to the playbook

Verify that the iptables nat rules have been added to the VPS with iptables -t nat -nvL (iptables -nvL will just show the base rules)

Once the rules are allowed run telnet tests from EC2 controller or mac laptop to ports 587, 139 and 25

NOTE: for port 25 need to raise support ticket to open port 25 up. They block outbound 25 by default for spam protection

Even with the port 25 opened, the telnet still fails because Comcast/Xfinity and AWS EC2 both block inbound and outbund port 25

The way to test port 25 is to use the Windows Wireguard VPN client and tunnel the port 25 telent through the blocks.At the VPS the packet will be de-encapsulated and nat'ed to the LXC private container address.  The source ip will be a private wireguard ip address and this is ALLOWED all by the iptables as well.

The packet to the private ip LXC container ip is routed throiugh a bridge interface on the VPS routing table to the LXC container.

The bridge interface is used for all LXC containers on the VPS (loopback address on the VPS is used for docker containers)



### Thunderbird clients and testing the email services with inter-domain sending and receiving of email

Add Thunderbird clients to the Mac and windows laoptops and configure the notifications user on the Mac and a personal email user on the Windows.
The users should be able to be verified as they were configured earlier on the iRedMail via the Web console above.

Send the email from notifications acccount to check-auth@verifier.port25.com
If sucessful will get Authentication Report:
This checks the DKIM DNS record as well as the SPF (TXT) DNS records and verified the inter-domain email reception on the notifications Thunderbird account.

Summary of Results

SPF check:          pass
"iprev" check:      pass
DKIM check:         pass


At this point the email server is fully functional

Ran additional tests with inter-domain emails to gmail account (to and from) the linode.cloudnetworktesting.com account and it is verified.







## Zulip collaboration:

NOTE: can use the Windows10 client and iphone as wireguard clients to test this out.
All traffic must be done on wireguard clients. The zulip.linode.cloudnetworktesting.com site is not reachable outside of the VPN tunnel.  Iphone and window10 client both have wireguard client installed on them.

Zulip is installed as an LXC linux container and not a docker container. See the notes above on the LXC container architecture and design for this project, including the networking considerations. Of note is that the LXC containers use the bridge interface br0 on the VPS rather than the loopback ip address for routing betweeen the LXC containers and the VPS host.  (The docker containers use the loopback ip address on the VPS)

The installation and design for teh Zulip is very similar to iRedMail given that both use LXC containers for deployment. Both iRedMail and Zulip have container addresses on the private interface address 10.100.100.x/24. The installation directories for both of these containers is in the /root/lxc folder. Each container has its own dedicated folder (FQDN of the container). lxc-create command is used to create the container.

Once the ip address is assigned, the lxc-start starts the container. The lxc-attach is required as some of the configuration (as in iRedMail) needs to be done in the container itself.   The DNS configuration, as with iRedMail, is done manually. (not done through ansible).  LXC containers ar difficult to fully automate, unlike docker containers.

For this zulip installation the OS of the LXC container needs to be upgraded.  openssh-server needs to be installed as well.  With the DNS set up we will be able to ssh into the LXC container directly from the EC2 controller using the ProxyJump command in the ~/.ssh/config file on the LXC container. Similar to what we did for iRedMail.  The proxy in this cases is the VPS. So EC2 controller to proxy VPS to the LXC container on the VPS.   (See the secction on ssh configuration in the notes.txt file on the EC2 controller and local VSCode repo.  It is not  pushed to github for security reasons).

The backup script to /mnt/storage and the cronjob are set up in the ansbile playbook zulip role much in the same manner that was used for iRedMail. The shell script is esssentially the same.

Now that the ansible scripts are done, the installation of zulip must be manual. We have ssh directly to the LXC set up, so this can be done easily.  

NOTE: at this stage in the project we do not have Traefik docker container up (with nginx) so there is no https daemon running on the VPS itself. But zulip service has a bundled http server in it. For now, with Wireguard VPN point the FQDN for zulip to the private LXC container ip adddress to bring up the Web Admin console so that we can continue configuration of the zulip server.  We did the same for iRedMail Web Admin console. However, with zulip there is Host header checking so the /etc/hosts file on the Windows Wireguard VPN client needs to temporarily point to the private ip for the FQDN of zulip.

The email admin and notifications needs to be set up: one for the zulip administrator and one for no-reply. See the word doc for specifics on the configuration for this and the changes required in settings.py  There are also changes on the iredmail LXC container conifguration that need to be made for zulip emails so that iredmail can receive emails from no-reply emails from zulip.
















## Traefik:

### Traefik, PART1:



#### TLS and HTTPS high level security for this project: 

Traefik is a reverse proxy and TLS termination proxy and also can be used as a loadbalancer


It is critical on this setup because it is the only way to access HTTPS into the VPS



It will be deployed as a docker container and not an LXC container so it will use the loopback address on the VPS rather than the bridge interface

It will route the traffic the appropriate docker container or LXC container on the backend.
Traefik:

For zulp and for iRedMail it will be used so that the web admin consoles for each can be reached through the public interface. Right now we are getting the web console of both via going to the private ip address.  The certs used for this are not the certbot certs created for iRedMail for example,  but the default nginx TLS certs that are with the zulip and iRedMail nginx instances (they are both self signed certs)



For the actual mail the iRedMail cert uses the certbot cert

For webadmin is it using a self signed cert from GuangDong (see above) for now until Traefik is up

For zulip it is also a self signed cert for the nginx (webadmin) (see above) until Traefik is up


For zulip, traefik will also be used for client to server communication for the actual messaging.(unlike iRedMail which uses certbot cert for the actual mail sending and receiving)

Traefick will have own ssl/TLS certs and will expose the public ip address



#### Docker container for Traefik

Traefik will run as a docker container on the VPS
the loopback address on the VPS will be used to route the traffic coming into the public VPS interace to the the traefik docker container.
The good thing about docker is that the iptables rules will automatically be added to the iptables as the docker container is configured and run. So the port 80 traffic and the 443 traffic will be allowed through the VPS interface and into the loopback interface to the docker container.
 When the container is started and run we will map 80 on VPS to 80 on the container and 443 on the VPS to 443 on the container. See below.

 [root@vps ~]# docker ps
CONTAINER ID   IMAGE            COMMAND                  CREATED        STATUS        PORTS                                                                      NAMES
d8e96b83e26c   traefik:latest   "/entrypoint.sh --lo…"   17 hours ago   Up 17 hours   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   traefik.linode.cloudnetworktesting.com

NOTE: the traefik is a reverse proxy and TLS termination point and will be used to redirect incoming HTTPS traffic to the VPS to the appropriate LXC container below using the file provider. More on this below (PART3)


#### Traefik docker container design:

From the traefik docker-compose.yml file:
#the commands below will be passed to the traefik binary. If running traefik command from the command line these would be the
#arguments and flgs to start the traefik service on the docker container
#enable the dashboard
#set up http and https entrypoints on ports 80 and 443
#enable both docker providers (for redirection to the gitlab docker container) and file providers (redirect to LXC containers
#like iRedMail and zulip
#Set up automatic certs with let's encrypt
#The ports are the -p in docker run, i.e. mapping host 80 to docker container 80 and host 443 to docker container 443
#docker.sock is used so docker can communicate with docker engine and monitor for docker labels. These are required for the 
#docker provider redirection by traefik to docker containers (using docker providers)
#acme.json is for cert storage
#./configs is for the file provider which is used to redirect to LXC linux containers like iRedMail and zulip
#This traefik container is part of the docker web network
#Docker labels section below:
#the last part is the setup for the lables that will be used by the docker provider: basically set up a router to match a 
#certain hostname, then redirect http to https, set up lets encrypt as the TLS resolver, 
#Finally set up the whitelist and basic auth. Since the release of Traefik 3.0, 'ipwhitelist' is deprecated and is now called 
#'ipallowlist'  This has been corrected below. Also '${service}-whitelist' has been renamed '${service}-allowlist', and 
#'${whitelist}' should become '${ip_allowlist}'  All of these corrections have been done below and are in the latest git repo.
#Note in Traefik 3.0 and up the hostregexp is '.+' which is already corrected below  

#NOTE: the file providers directory is specified in the docker-compose.yml file (see providers.file.directory=/configs/)
#so that the container knows which directory is for the file providers. The /configs directory is copied over from source code
#to the container as part of the ansible plabyook. Finally, in the docker-compose.yml file note that the /configs directory 
#from source is mounted to the traefik docker container as a volume.




#### Routing to LXC containers:

Full network diagrams are available in the word doc.
See below. Traefik will be used to route the traffic to the zulip LXC container
Traefik providers route traffic to endpoints
File providers are used specifically for linux LXC containers. We will use these to route traffic to the web admin servers for LXC iRedMail and LXC zulip and zulip can use this also for HTTPS messaging traffic as well (client to server communication)
The file provider will tell it for zulip, for example,  route the traffic to the LXC at 10.100.100.12


#### Routing to Docker containers:

For gitlab it will be a bit different (docker container)
A docker provider instead of a traefik provider will be used
Docker labels will be used in traefik to add to the traefik config so that it can route the traffic to the gitlab docker container


### Traefik, PART2:

This involves adding the traefik role to the ansible-playbook. This part of the ansible playbook will deploy the traefik docker instance as noted above and below:

 [root@vps ~]# docker ps
CONTAINER ID   IMAGE            COMMAND                  CREATED        STATUS        PORTS                                                                      NAMES
d8e96b83e26c   traefik:latest   "/entrypoint.sh --lo…"   17 hours ago   Up 17 hours   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   traefik.linode.cloudnetworktesting.com


An A record for the docker container, traefik.linode.cloudnetworktesting.com must be added to the linode Domains manager. The FQDN will be used in the browser for the Traefik Web Admin console access. Note that ansible playbook will also configure the password (add this .env file to the .gitignore, even though the password is entered in encoded format)

Note that the docker network for this docker container is named 'web' as this traefik docker container is for all HTTP/HTTPS traffic that needs to get into the private VPS network.

The docker-compose.yml file is used to bring up the traefik docker container. See the file on the github repo for comments regarding details of the code in this file.

/traefik/.env will have all the values of the variables used in the docker-compose.yml file.
Note that the ip_allowlist (formerly whitelist) covers the entire private ip subnet range.

Once the docker container is up after running the playbook. the Traefik Web UI Admin console should be able to be reached from the Windows Wireguard VPN client using the dummy A record created above with the private IP. This is just a temporary workaround to get into the Traefik web admin console until the console is reachable via the public ip and FQDN (see PART3 below)


#### Detailed packet flow with Traefik

When we enter https://traefik.linode.cloudnetworktesitng.com it resolves to public ip of VPS
Packet hits VPS and 443 is a match for the docker container on the VPS listening on 443. See below from VPS

[root@vps traefik]# docker ps
CONTAINER ID   IMAGE            COMMAND                  CREATED          STATUS          PORTS                                                                      NAMES
d8e96b83e26c   traefik:latest   "/entrypoint.sh --lo…"   20 minutes ago   Up 20 minutes   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   traefik.linode.cloudnetworktesting.com 


the 443 host is mapped to 443 on traefik docker container.

The packet is forwarded using the loopback interface to the container on 443.

The third router below is activated because host header has traefik.linode.cloudnetworktesting.com

Click on the router in the web console and you can see the below

traffic@docker is the router
Then the 2 middlewares are hit: first the ipallowlist. The source must be from the 10.100.100.x subnet (whichi it is if from wireguard VPN on windows). If this fails you will get Forbidden and no basic auth challenge. See above when I tried going in on nonVPN client (Mac).
Next basic auth which is the admin user and ******* password.

Finally the service api@internal is hit. This will route the packet to the final destination. For example if there is redirection to the iRedMail LXC container or the zulip LXC container. These will be handled in next section (PART3) under the ansible-playbook configs folder under traefik directory. LXC containers use the file provider (whereas redirection to docker containers use the docker provider and labels. Gitlab will use this)
Once we get the redirection up on to iRedMail and Zulip we will be able to web Admn in via the public ip addresses (but still must use VPN because of the traefik whitelist)

### Traefik, PART3:


#### Introduction:

Using the file provider to redirect traefik traffic flows to the appropriate LXC container based upon the file provider in Traefik. More on this later. NOTE: the docker provider will be used to redirect traffic flow to docker containers using labels.  This will be done for a gitllab docker container.  For now the focus is on the 2 LXC containers for iRedMail and Zulip.

[root@vps ~]# lxc-ls -f
NAME                                 STATE   AUTOSTART GROUPS IPV4          IPV6 UNPRIVILEGED 
mail.linode.cloudnetworktesting.com  RUNNING 0         -      10.100.100.11 -    false        
zulip.linode.cloudnetworktesting.com RUNNING 0         -      10.100.100.12 -    false        
[root@vps ~]# 

This is all about getting public access to the LXC containers. Right now, we are going directly to the 10.100.100.x private ip addresses in the browser through the VPN

Since traefik whitelists 10.100.100.x only, we still have to have the VPN up but we will now be able to get to the Web Admin of iRedMail and Zulip through the FQDN and public ip address.

Note the SSL/TLS certs for this HTTPS traffic is in Traefik and NOT certbot. Certbot is only used for the native apps like iRedMail mail protocols.

All web traffic passes through the Traefik as SSL termination point. The Traefik will redirect the cleartext HTTP traefik to the proper LXC container.   The backend can be done through HTTPS but if the network is that compromised, there are other security problems. In general the backend does not need to be encrypted.


The file providers are in the traefik/traefik/configs/ part of the ansible-playbook as mail.yml for redirection to the iRedMail LXC container. and zulip.yml for redirecton to the Zulip LXC container.


#### Ansible plabyook for the file providers:


Currently attempting to get to https://mail.linode.cloudnetworktesitng.com admin page will result in 404 because traefik dose not know how to route the traffic for this Hostname URL.

This is the purpose of the file providers section in ansible, to configure traefik with the redirection configuration to do this.

For example, the mail.yml has the routing for traefik to the backend iRedMail LXC container. Note that it is Host header based and the redirection endpoint is provided at the bottom as services: mail: loadbalancer: servers: url: with the private ip of the iRedMail LXC container.

Once this is configured on the traefik docker container this will permit public ip address HTTPS traffic for the Web Admin URL to get redirected to the iRedMail container. Note that the source ip is still whitelisted so the Wireguard VPN must be up.

NOTE: the file providers directory is specified in the docker-compose.yml file so that the container knows which directory is for the file providers. The /configs directory is copied over from source code to the container as part of the ansible plabyook. Finally, in the docker-compose.yml file note that the /configs directory from source is mounted to the traefik docker container as a volume.

NOTE: Important note that this is specifically for the public access to the iRedMail web admin via HTTPS. This is not for sending and receiving email.That is through mail protocols 587, 143 and 25 (SMTP and IMAP). This email is encrypted with certbot cert and not the SSL certs on traefik


After running the ansible playbook the https://mail.linode.cloudnetworktesting.com admin page now works

#### The same needs to be done for the Zulip admin web page

File provider needs to be added under traefik/traefik/configs/zulip.yml for the Zulip web admin traffic redirection











## Pi-Hole:

Pi-Hole is a network ad and tracking blocking application
We will use the Pi-Hole as DNS server for wireguard. So when VPN is up **all** DNS requests going through the tunnel will use Pi-Hole as DNS Resolver
All ad related URLs will get 0.0.0.0 response from Pi-Hole (blocked) and all legitimate will get forwarded to the DNS server configured in Pi-Hole (1.1.1.1 cloudflare.com) and resolved

1.1.1.1 is a public DNS resolver operated by Cloudflare that offers a fast and private way to browse the Internet. Unlike most DNS resolvers, 1.1.1.1 does not sell user data to advertisers. In addition, 1.1.1.1 has been measured to be the fastest DNS resolver available.


In linode Admin console need to first add an A record for pi-hole: pi-hole.linode.cloudnetworktesting.com

For ansible playbook, in pi-hole/tasks/main.yml copy over the source from the EC2 controller pi-hole directory to the /root/service/pi-hole directory on VPS so VPS can create the pi-hole docker container. We will use a docker container for pi-hole thus routing to it will use the loopback interface on the VPS and not the bridge interface. See earlier sections above for differences between LXC containers and docker containers packet flow and configuration.

The main.yml also has the backup shell script lcoation and the cronjob configuration for the docker container's backup. All LXC and docker containers are backed up as well as mariadb dump to the backup. /mnt/storage is 40GB external volume with linode that has all the backups.

The docker-compose.yml is under pi-hole/pi-hole/docker-compose.yml
Notes on the docker configuration and design are below:

### Pi-hole Docker container design:

#the labels below are how traefik routes or steers the traffic for this pi-hole docker container when it comes to it on the public VPS interface
#The ${} are variables and will be defined in the pi-hole/pi-hole/.env file (note this is added to .gitignore)
#pi-hole container will be in the docker network web just like traefik is. 
#Recall that the VPS uses the loopback ip address 10.36.7.11 to route the traffic to docker containers that are on another different private network (like 172.21.x.x)
#The pi-hole DNS ports are 53 tcp and udp and 67 udp, but 67 only used if pi-hole is acting as a dhcp server as well
#These ports are mapped from VPS to the container through the loopback ip address
#The etc-pihole and etc-dnsmasqd and etc-hosts directories will be mapped to volumes for persistency


etc-hosts and /etc/hosts files:
#The etc-hosts has a unique function in this setup: pi-hole container hostname set to same name as the A record that maps to the
#public VPS ip adddress (in linode domains: pi-hole.linode.cloudnetworktesting.com) then there will be a problem
#The docker container will assign the private docker container ip address 172.21.x.x to the hostname.
#Need a custom /etc/hosts file mounted as a docker volume from the /etc-hosts on the container to specify the host to ip mapping 
#using the public VPS address. If we do not create our own /etc/hosts then pi-hole will create its own /etc/hosts inside the container
#(not in a volume) and it will use the private ip container address and not the public VPS address
#If pi-hole is set as the DNS server (as with Wireguard clients), when asked to resolve pi-hole.linode.cloudnetworktesitng.com, if
#there is no /etc/hosts volume from the /etc-hosts with public VPS, it will use its local /etc/hosts that has the private container
#ip address. This will cause the wireguard clients to fail because the private address cannot be routed
#With the volume /etc/hosts from the /etc-hosts with the public VPS ip address, it will not create its own /etc/hosts and this volume
#/etc/hosts will be used to resolve pi-hole.linode.cloudnetworktesting.com to the proper public VPS ip address.
#This will resolve as if it used the upstream name server (linode domain server). This is all so that we can use traefik to route
#to the public ip address via HTTPS through the whitelist Wireguard. This won't work unless the public ip is used.

### docker labels for traefik based routing:

#The labels below will be used by traefik to route the incoming HTTPS443 packets to the pi-hole docker container based upon the 
#Host header specified as Host below. Very similar as the file providers that were used for LXC containers iRedMail and zulip, 
#except here the docker provider is used by traefik along with these labels, so there are no explicit config file provider files to
#be added to ansible. It is all done on the fly for docker containers based upon the labes below.
#The port 80 is the webserver port on the pi-hole docker container running the webadmin GUI.

### /pi-hole/pi-hole/etc-hosts file in the ansible-playbook: 

#The etc-hosts has a unique function in this setup: pi-hole container hostname set to same name as the A record that maps to the
#public VPS ip adddress (in linode domains: pi-hole.linode.cloudnetworktesting.com) then there will be a problem
#The docker container will assign the private docker container ip address 172.21.x.x to the hostname.
#Need a custom /etc/hosts file mounted as a docker volume from the /etc-hosts on the container to specify the host to ip mapping 
#using the public VPS address. If we do not create our own /etc/hosts then pi-hole will create its own /etc/hosts inside the container
#(not in a volume) and it will use the private ip container address and not the public VPS address
#If pi-hole is set as the DNS server (as with Wireguard clients), when asked to resolve pi-hole.linode.cloudnetworktesitng.com, if
#there is no /etc/hosts volume from the /etc-hosts with public VPS, it will use its local /etc/hosts that has the private container
#ip address. This will cause the wireguard clients to fail because the private address cannot be routed
#With the volume /etc/hosts from the /etc-hosts with the public VPS ip address, it will not create its own /etc/hosts and this volume
#/etc/hosts will be used to resolve pi-hole.linode.cloudnetworktesting.com to the proper public VPS ip address.
#This will resolve as if it used the upstream name server (linode domain server). This is all so that we can use traefik to route
#to the public ip address via HTTPS through the whitelist Wireguard. This won't work unless the public ip is used.

### Traefik http router for pi-hole HTTP/HTTPS Web Admin traffic:

Once the ansible playbook is executed a new HTTP router will be configured on Traefik (See Traefik web console through the Wireguard VPN)
With this in place we can log into the p-hole Web Admin console. 
The Upstream DNS servers should show the Cloudflare DNS server because that was configured through the ansible playbook.(.env file)

from the docker-compose.yml file note the traefik http/https trafifc routing for Web Admin:
#NOTE the port 80 in traefik labels below:
#NOTE that there is no iptables rule for port 80. This is in the docker-compose.yml file for the pi-hole.  Traefik uses the labels # on the containers to route the traffic for docker containers for HTTP/HTTPS. The 53 DNS traffic for pi-hole on the other hand# needs the iptables nat rules to redirect from loopback to the docker container pi-hole private ip address.
#But this port 80 is allowed through (initial pi-hole dashboard) and there is a redirection on the pi-hole nginx web server to redirect to port 443. Once the # redirect to 443 occurs the HTTP router for pi-hole then challenges the user for basic auth and routes the traffic to port 80 and container ip address of pi-hole based # upon the Host header in the 443 packet.. The 80 below is just for the initial hit to the pi-hole through Traefik. The redirection # is where the traefik HTTP router kicks in based upon the Host header.



### Detailed packet flow of DNS requests with Wireguard clients configured to use pi-hole as DNS resolver:

For testing the pi-hole as DNS resolver must do this on a Wireguard VPN client because regular clients will not use pi-hole as DNS resolver
For example on EC2 controller metrics.icloud.com will show several answers to drill metrics.icloud.com
Going onto to the Windows client without VPN up an nslookup on metrics.icloud.com will show the same.
Next in the Wireguard configuration we set DNS=10.36.7.11 which is the VPS lookback. Recall that the loopback is how VPS routes traffic to the docker containers.  10.36.7.11 will only be reachable through the wireguard VPN.  There are iptables nat rules in the iptables configured on VPS so that the DNS traffic to 10.36.7.11 is routed to the pi-hole docker container.

    3   156 DNAT       6    --  !br-3db65a09fb68 *       0.0.0.0/0            10.36.7.11           tcp dpt:53 to:172.21.1.3:53
 5407  447K DNAT       17   --  !br-3db65a09fb68 *       0.0.0.0/0            10.36.7.11           udp dpt:53 to:172.21.1.3:53
    0     0 DNAT       17   --  !br-3db65a09fb68 *       0.0.0.0/0            10.36.7.11           udp dpt:67 to:172.21.1.3:67


Once this is set up, running nslookup metrics.icloud.com on Windows10 client with VPN up will show 0.0.0.0, meaning the ads are effectively blocked


Note the ip tables nat rules above. These were installed automatically by docker when the docker container was created. This is why docker containers are ideal for certain apps like pi-hole

The Wireguard tunnels the inner packet with source 10.100.94.11 and dest 10.36.7.11 to the VPS. It is decapsulated and the 10.36.7.11 is then nat’ed to the pi-hole container ip address
This is similar to the iRedMail rules above this which would route traffic from public VPS ip on those ports to the LXC container ip addres 10.100.100.11.











## NextCloud:

Nextcloud will be deployed as another docker container (in addition to pi-hole, traefik, checkmk, etc) on the VPS

First add the A record for nextcloud.linode.cloudnetworktesting.com to the linode Domains configuration.

Ansible: The nextcloud/tasks/main.yml has the copy over to /root/services/nextcloud of the source code from the EC2 controller onto the VPS so that the docker container can be created.  Main.yml also has the backup scripts for the nextcloud docker container and this is configured as a cronjob

[root@vps config]# crontab -l
#Ansible: mariadb dump all databases
01 3 * * * /root/scripts/mariadb-dump-all-databases.sh > /dev/null 2>&1
#Ansible: lxc mail.linode.cloudnetworktesting.com backup
01 1 * * * /root/scripts/lxc-backup-mail-linode-cloudnetworktesting-com.sh > /dev/null 2>&1
#Ansible: lxc zulip.linode.cloudnetworktesting.com backup
11 1 * * * /root/scripts/lxc-backup-zulip-linode-cloudnetworktesting-com.sh > /dev/null 2>&1
#Ansible: backup pi-hole
01 2 * * * /root/services/pi-hole/backup.sh > /dev/null 2>&1
#Ansible: backup nextcloud
06 2 * * * /root/services/nextcloud/backup.sh > /dev/null 2>&1
#Ansible: nextcloud background jobs
*/5 * * * * docker exec --user www-data nextcloud.linode.cloudnetworktesting.com php cron.php

Since the nextcloud docker container is using the VPS mariadb the nextcloud/nextcloud/docker-compose.yml will have the mariadb nextcloud mysql user's name and password so that the container has access to the mariadb thorugh this nextcloud mariadb user.  Nextcloud uses the mariadb on the VPS host.  See implementation details in "Nextcloud total packet flow" below.

In addition the docker-compose.yml will have all the necessary labels so that traefik can properly terminate and route the packets destined for the nextcloud docker container.

The new mariadb user "nextcloud" is created by ansible in the mariadb role.   The users are added to the mariadb/tasks/users.yml file as shown below:

- name: check for database user 'nextcloud@172.21.0.0/16'
  community.mysql.mysql_user:
    login_user: root
    login_password: "{{ mariadb_root_password }}"
    name: nextcloud
    password: "{{ mariadb_nextcloud_password }}"
    host: "172.21.%.%"
    priv: 'nextcloud.*:ALL'
    state: present
  no_log: true


All the passwords are encrypted with ansible-vault. All sensitive files are in .gitignore file and not present in the github repo. (mariadb/defaults/main.yml)

The iptables in the ansible role security must be updated with the 3306 ip tables Allow rule as indicated above so that the nextcloud docker container can access mariadb on the VPS. This is done in security/files/iptables.rules

The source subnets are based upon the /etc/docker/daemon.json on the VPS which has the 172 subnets that all docker containers will use by the docker engine on the VPS when creating docker containers. Thus the iptables must have rules to allow 3306 from any of these predefined docker private ip container address subnets.

Prior to running the nextcloud role on ansible-playbook, run the mariadb and the security roles on ansible-playbook to update the mariadb and iptables with the above information.

Note this will restart all docker containers and ip tables prior to running the nextcloud ansible role.

The users will be created as shown below on the VPS
[root@vps zulip.linode.cloudnetworktesting.com]# mariadb -e "select user,host from mysql.user;"
+---------------+-------------+
| User          | Host        |
+---------------+-------------+
| PUBLIC        |             |
| backup_script | 10.100.94.% |
| nextcloud     | 10.100.94.% |
| nextcloud     | 172.21.%.%  |
| nextcloud     | 172.22.%.%  |
| nextcloud     | 172.23.%.%  |
| nextcloud     | 172.24.%.%  |
| nextcloud     | 172.25.%.%  |
| backup_script | localhost   |
| mariadb.sys   | localhost   |
| mysql         | localhost   |
| nextcloud     | localhost   |
| root          | localhost   |


On traefik Web Admin there will be a new HTTP router for nextcloud
On the traafik HTTP router in middleware in addtion to the ipallowlist (basic auth challenge) there is a redirect for CalDAV and CardDAV for other client integration with the nextcloud contacts and calendars.  I tried to get this to work on Windows10 and could not because it does not natively support nextcloud "other", but just the standards like google and icloud integration.  Mac has an "other" and one is able to share calendar and contacts in nextcloud with the calendar and contacts on Mac and viseversa.


### Nextcloud docker container design:

#since nextcloud is being created as a docker container, need to set labels so that traefik can route the HTTPS traffic coming into# public interface to the docker container ip address which is routed through the VPS loopback interface
#Very similar to the pi-hole docker-compose.yml file, this will use .env file for the values of the variables below
#The volume ./html/var/www/html must be created as noted in the nextcloud documentation
#This container will also be a part of the docker network as well.
#All environment vars are in the nextloud website user docs. We only need to set the database stuff
#The db that is used will target the mariadb running on the VPS 
#An alternative is to run a new mysql db inside the container, but alternatively the mariadb on host VPS can be accessed
#by the nextcloud docker container via docker networking and this is the approach taken here.


### Docker labels for traefik based routing:

#For the labels the first 6 are similar to those used for the pi-hole container insofar as traefik is concerned. 
#Traefik uses these labels to create the HTTP router that will route the traffic from the VPS public to the nextcloud
#container ip address.
#"traefik.http.routers.${service}.middlewares=${service}-allowlist,${service}-redirects,${service}-sts"
#The line above adds the middleware but also configures redirects and sts, or strict transfer security header
#The use of the redirects: for CalDAV and CarDAV to work we have to have the redirects configured as below:
#"traefik.http.middlewares.${service}-redirects.redirectregex.permanent=true"
#"traefik.http.middlewares.${service}-redirects.redirectregex.regex=https://(.*)/.well-known/(card|cal)dav"
#For sts, this is not absolutely necessary but will remove a warning in the nextcloud admin console. The sts labels
#are the last 3 lines in the labels section below   



### Nextcloud total packet flow using Traefik HTTP router (diagrams in word doc):

Nextcloud uses the mariadb on the VPS host. To do this there must be iptables that allow 3306 from the docker container networks (172 networks)

Nextcloud will route the packet out of the container to the VPS mariadb using the 10.36.7.11 loopback

Iptables firewall rule will allow traffic from docker to the VPS

The connection to 3306 mariadb on the VPS will be from one of the nextcloud configured users (all docker private subnets are added by the ansible playbook to mariadb so any nextcloud container will be able to connect to mariadb on 3306).

From wireguard client to nextcloud.linode.cloudnetworktesting.com 
First it needs to resolve it to an ip address.
Pi-hole is the dns resolver for all these wireguard clients


The ip address will resolve to the public ip of the VPS
The public ip address in wireguard config is in the IP allowed list

The route is through the VPN tunnel
The packet is then forwarded through the tunnel. The inner packet is 443 and this is forwarded to the traefik TLS termination service on the VPS that has an HTTP router for routing the traffic to the nextcloud docker container based upon the docker labels in the docker-compose.yml for nextcloud


Traefik routes based upon the Host header in the packet
Traefik is a TLS termination point (certs are issued by letsencrypt) so traefik can look at the decrypted HTTP packet headers to do this routing.

Traefik has an ip whitelist and the private ip client addreses assigned by the the VPN (to the Windows or Mac client) is in the whitelist so the packet can be routed through traefik. Before that basic authentication challenge to log in to the nextcloud docker container.


### Nextcloud Web Admin GUI

In the Web Admin GUI make sure to install all the default apps.
When bringing this up the first time there are lot of errors and warnings. 
The reverse proxy error and the HTTPS error can be addressed by adding code to the /root/services/nextcloud/config/config.php file (see word doc for details on this)

For the maintenance window warning add this to the config.php file at the end:
'maintenance_window_start' => 1,

For the occ warning run this command in the VPS shell. It will add all the missing indices
docker exec -it --user www-data nextcloud.linode.cloudnetworktesting.com php occ db:add-missing-indices

Also the email needs to be configured on the nextcloud Admin so that nextcloud can send updates and admin information to the user.  Add the email to the account admin as well. 
Also can add the iRedMail email info to share the iRedMail with the nextcloud mail app. Make sure to use StartTLS for both IMAP and SMTP and port 143 for the dovecot IMAP

### versions 29.0.8 and 30.0.0 issues

The latest codes unfortunately are throwing many more warnings than before. These will be addressed in 29.0.8 and 30.0.1, but these patches are not avaiable at the time of this writing.
Will have to downgrade to a 28 version for now until it is patched. The process for downgrading involves wiping out the old docker configuration: drop the mariadb database table "nextcloud" and delete the /root/services/nextcloud/html folder configuration on the VPS. Also put in the older version in the .env file for the nextcloud role: docker_image_tag and update this value in the running /root/services/nextcloud/.env file on the VPS. Then bring back up the container and it should be downgraded. 







## checkmd:

### The 3 agent files and adding EC2 public ip to the traefik whitelist for checkmk:

IMPORTANT NOTE: checkmk requires 3 agent components and these can be downloaded from the Web Admin console above
The URL is from the Web UI Console and wget can be performed on the ansible controller to get the file
The file is being downloaded from the checkmk docker container.

In order to get access to the docker container URL to download these 3 agent files, one must be either on a Wireguard client
(the middleware on checkpoint traefik allows 10.100.94.x wireguard client ips) 
OR one must add the public IP of the address of the ansible controller. In my case this is an EC2 controller that does not have wireguard installed.
Add that public EC2 IP address to the ip_allowlist for traefik label whitelist for the checkmk container in the .env file
      - "traefik.http.middlewares.${service}-allowlist.ipallowlist.sourcerange=${ip_allowlist}"





### S3 bucket (optional):

I have added access to an S3 bucket for the EC2 ansible controller for any files that are not easily able to be downloaded to the EC2 controller. Did not have to use this here but this is good to have.
Ensure that the aws cli on the ansible controller is set to the proper profile
aws configure
aws configure list
export AWS_PROFILE=
~/.aws/config and ~/.aws/credentials



### Notes on the shell scripts used for this docker container and operational deployment of host/agent and adding the VPS public ip to the traefik whitelist for checkmk for the REST API packets from VPS to checkmk docker container:

The setup_host_non_REST_API.sh is used if adding the agent/host to checkmk with Web Admin GUI. .  This script just puts the 3 agent files in the following directories on the VPS: /etc/systemd/system and /usr/bin. These are agent files 

The setup_host.sh is used if adding the agent/host to checkmk with the REST API. This has an additional 3 lines to copy over the following files to the VPS:
add_host.sh and .env to the /root/services/checkmk/agent directory. The agent directory is a new directory created on the VPS for this REST API host adding.
The last line in this file also runs the add_host.sh which invokes the REST API to add the VPS host to checkmk. 

NOTE: the add_host.sh is run from the VPS itself after it has been copied to the VPS with the setup_host.sh. Because of this the VPS public ip needs to be added to the checkmk middleware (traefik) ip_allowlist whitelist. The REST API packets are https packets so they are sent to traefik and as such they have to be whitelisted to be reverse proxied to the checkmk docker container.


The setup_host.sh needs to be run from the ansible controller. The syntax is:
./setup_host.sh linode.********.com
This will place the 3 agent files onto the VPS and place the add_host.sh on the VPS and then execute the add_host.sh on the VPS

The add_host.sh will sequentially execute the 3 curl REST API actions to create the host on the checkmk and then service discovery and then apply service discovery to the host on the Web Admin console.

There was an issue between the second curl REST API invocation and the third curl REST API invocation due to the service discovery not being complete prior to the apply (the third curl REST API).  Adding a sleep of 10 seconds between second and third curl resolved the issue. A sleep of 3 seconds was inadequate to fully resolve the issue.

See the notes.txt  for more detail on the script and the exact nature of the fix.



### Important note on iptables configuration:

In order for the docker checkmk container on the VPS to monitor the VPS host itself we need to add iptables rules for allowing traffic from docker container networks (172 nets) where checkmk docker container is running to the VPS host on port 6556 (the agent listening port), the checkmk protocol service port.

This can be effectively tested with the nsenter command and telnet from the VPS since the checkmk container does not have telnet installed. See notes.txt file for extensive details on how to do this.

The command is below to be run on the VPS itself:
nsenter -t $(docker inspect --format {{.State.Pid}} checkmk.linode.*********.com) -n telnet vps.linode.******************.com 6556

This cmmand gets the docker pid and runs a docker inspect on the container of that pid (checkmk docker container) and then initiates a telnet FROM the docker container to the host (VPS) by utlizing the telnet command binary on the VPS itself, to port 6556

This will fail prior to the iptables urles being added and will work once the iptables rules are added. See notes.txt for the exact rules and implementation details.


In short all the docker 172 container nets need to be allowed to the VPS public ip on port 6556

### Docker container design for checkmk service and mailer service (smtp relay service):

The checkmk setup consists of 2 containers: one for checkmk server and the other for an smtp relay using iRedMail as desitination. The udpates are important becuase checkmk as a monitoring tool requires a lot of updates.  It works very well during testing.

#2 services on this docker compose: checkmk and mailer

#For checkmk, tmpfs is suggested in checkmk docs for better performance
#For the volumes, the /etc/localtime on VPS is mapped to /etc/localtime on the container to match the timezones
#For the second volume ./sites is /root/services/checkmk/sites on VPS to /omd/sites to persist the data the checkmk creates on the container
#The MAIL_RELAY_HOST is section 2.4 in checkmk docker documumentation link: sending notifications
#https://docs.checkmk.com/latest/en/managing_docker.html
#the host in the container cannot send mail directly but has to send to an smtp relay server (smarthost)
#which is this mailer docker service (second service name below), which will then forward to iRedMai LXC container mail server
#The labels for checkmk are for the traefik http/https router as was done for the other container. Letsencrypt TLS cert will be
#used for the HTTPS termination.
#Note that in the labels, the webserver on checkmk listens on port 5000  

#Mailer service notes:
  #The mailer service (smtp relay on the checkmk docker container) is below
  #The mailer will use the .env variables below. Checkmk mail will go to this mailer and the mailer will then
  #relay the mail to our LXC mail container on the VPS and the mail will then reach its final destination
  #Note that both mailer and checkmk services are part of the same docker network web so they can innately communicate
  #with one another.    

### Docker labels for Traefik based routing for the checkmk docker service:

The labels are similar to the other traefik based labels for the docker containers in this project. Note that the http daemon for checkmd is running on 5000 and also note that middleware ip_allow list is enforced. To disable the whitelist simply comment out the last 2 lines of the lables. This is not recommended. All traffic will be passed through to the container through the public URL for checkmd.   Letsencrypt certs are used as with all the other docker containers running HTTP/HTTPS.








## Borg and borgmatic:

Borg is just installed as a server/service on the VPS.  It is not deployed as a separate container.
Borg needs to be installed on both the source (what is being backed up, in this case the VPS) and the destination repository (in this case we are using the block storage volumne for our linode VPS at /mnt/storage and borg backups will be made to /nnt/storage/backups/borg directory)

As such, there are only tasks and files and not a borg/borg directory with a docker-compose.yml file

In the tasks the main.yml will install borg, borgmatic and python-llfuse packages to the archlinux VPS using pacman ansible module. Python-llfuse is required to mount the backups to our local file system at /mnt/storage

The tasks will create the borgmatic directory at /etc/borgmatic. Ansible copies all of the shell scirpts the config.yaml and the .env.  Borgmatic is an API hook command application to make borg backup administration much more finer tuned and detailed. We can run some tests with this.

Prior to this, make the first backup. NOTE the backups will be added to the cronjobs and done daily thereafter.  First the backups directory must be initialized. Once intialized the first backup will be done manually so that we can run some tests with it.

We will use borgmatic to do some intentional file deletion and restore tests using the following commands:

borgmatic extract --archive latest --path usr/bin/chmod
and method 2 a mount: borgmatic mount --archive latest --mount-point /mnt/borg --path usr/bin













## Gitlab and the Gitlab runner:

### repository design FOR GITLAB PROJECTS:

See the first gitlab project in the section below "GITLAB PROJECT1: Basic website application test"


### Introduction:

When configuring this make sure that the VPS ip address is included in the ip_allowlist (whitelist) for traefik for the gitlab container.   This is because the gitlab-runner is running on VPS as well and the source of the packet when it registers with gitlab docker container is the VPS public ip address. Otherwise the gitlab-runner registration will fail.

NOTE the gitlab-runner can be installed on a standalone separate VPS server if throughput transactions need to be scaled up.



### detailed packet flow for gitlab docker container itself: 

NOTE: for details of specific iptables rules etc. see notes.txt file.
The network diagrams are in the Word doc.

NOTE: give that gitlab is essentially a repository driven based CI/CD tool, we must consider both HTTP/HTTPS packet flow and SSH packet flow analysis

NOTE: this analysis is not for the deployment of an application packet flow. That analysis follows in the next section

#### HTTP/HTTPS:

When connecting to gitlab.linode.*****.com or when doing a git clone https://.... for example then this packet flow will kick in.

Traefik’s labels (NOT gitlab container traefik labels) are below. Note the 80 is immediately redirected by the traefik middleware.


   labels:
      - "traefik.enable=true"
      - "traefik.http.routers.${service}.rule=Host(`${hostname}`)"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.permanent=true"
      - "traefik.http.routers.redirects.rule=hostregexp(`.+`)"
      - "traefik.http.routers.redirects.entrypoints=http"
      - "traefik.http.routers.redirects.middlewares=redirect-to-https"
      - "traefik.http.routers.${service}.tls.certresolver=letsencrypt"
      - "traefik.http.routers.${service}.service=api@internal"
      - "traefik.http.routers.${service}.middlewares=${service}-allowlist,${service}-auth"
      - "traefik.http.middlewares.${service}-allowlist.ipallowlist.sourcerange=${ip_allowlist}"
      - "traefik.http.middlewares.${service}-auth.basicauth.users=${traefik_dashboard_username}:${traefik_dashboard_password}"

So ultimately the packet is a 443 packet.

The 443 packet ends up on public VPS interface and the ports are mapped to  traefik container ip  80:80 and 443:443 by the firewall.
The iptables NAT table processes the packet. The PREROUTING table last rule is hit and this forwards to target DOCKER chain 
Once in the DOCKER chain the non bridge !br traffic hits the 443 rule and this forward the packet with DNAT to the traefik container private ip on port 443.
The destination ip is mapped from public IP to the container ip address of traefik 

NOTE: the docker container ip addresses will change each time ansible-playbook reintializes and installs new containers. The private ip address container ips are NOT static. They are, however from a select group of private 172 subnets.....

After the iptables NAT (iptables -t nat -nvL) processes the packet::  
(see notes.txt for dump of the actual firewaall rules)

Traefik sees the Host header as gitlab.linode.************.com and forwards the packet to the gitlab container via its HTTP router. 
This is where the gitlab docker-compose traefik labels of the gitlab container (not traefik container) are applied. There is middleware for an auth challenge so a login will be required.

Ngnix running inside the gitlab container actually serves the web content to the browser.




#### For SSH:

As noted above, for SSH repo operatoins involving the gitlab repo (project), for example a git clone git@gitlab.linode.********.com

By default 22 is used but there is an entry added to change this to a high numbered port in ~/.ssh/config file  of the EC2 controller. (note per the repository design section above, only the EC2 controller will be interacting with gitlab repo (and also github repo). The Mac local VSCode repo will only interact with github and that runs on standard 22 to github)

The container is configured with <high_numbered_port>:22 in its ports: in docker-compose.yml file so the packet will reach the running container gitlab application ultimately on 22


Docker will make the high numbered port on VPS to 22 on the gitlab docker container itself in the iptables NAT firewall rules. There will be a NAT for dest port high numbered port to the gitlab container ip on port 22

The packet first hits the PREROUTING table and forwards to the DOCKER chain
In the DOCKER chain there is a DNAT rule that will nat the packet 44822 to the gitlab container ip adddress and port 22


NOTE: traefik is only used for HTTP/HTTPS traffic flows. Traefik is not involved in any way for these SSH traffic flow to the gitlab container.



### Gitlab docker container design:

#The ports 44822:22 is for the ssh port to git repository (for example git clone using ssh rather than https)
#Packet routing details are in the notes.txt and README files. Packet routing for the HTTP/HTTPS gitlab is different from
#the packet routing for the ssh traffic to gitlab (for example git clone git@gitlab.linode.cloudnetworktesting.com)
#Traefik handles the HTTP/HTTPS but for SSH we have to map the port as below ourselves 44822:22 44822 is incoming and needs to be
#mapped to 22 on the container itself.
#The volumes has the directories that we need to persist even after the containers are removed.
#the first 3 volumes are from the documentation
#the fourth is to save the tar archive directly on the VPS
#LABELS: This line is the port that the nginx is listening on in the gitlab container
#- "traefik.http.services.${service_gitlab}.loadbalancer.server.port=80"
#For LABELS docker registry: registry is listening on port 5050
#Traefik needs to handle 2 servcies for this setup: one is service_gitlab and the other service_registry for the image/docker registry
#The ip whitelist subnets will be the same.
#/etc/gitlab.rb is the main configuration file
#This can be modified on an runing gitlab instance and gitlab reloaded and then they will take effect
#Here we will deploy the container with all of these settings already done, via the enviroment section below
#NOTE that the smtp configuration is here. It will use the existing iRedMail lxc container on the VPS
#worker processes based upon th cpu cores and prometheus will be disabled because it is resource intensive.
#nginx configuration follows: we only want to listen on port 80
#This is because the TLS cert for gitlab container is done by traefik via letsencrypt
#trusted_subnets are docker subnets
#nginx['real_ip_header'] = 'X-Forwarded-For' This is so that we see the real ip of the client in the logs and not the ip of
#traefik since traefik is reverse procying the connection along the way.
#Otherwise we will see the private ip of traefik 172 address in the gitlab access logs.
#Gitlab private docker images registry to store the docker images     registry_external_url "${gitlab_registry_url}










## GITLAB PROJECT1: Basic website application test:

IMPORTANT NOTE: the repository used for the application on the EC2 controller is completely separate from the repo used for the ansible playbook code. The repos are distinct and have to be separate because the application (project) repo push is what instigates the pipeline sript to be run on the gitlab runner.  The ansible playbook code only interacts with github for backup, and not with gitlab. More details on the gitlab project repo design in next section below....

### GITLAB PROJECTS repo design:

NOTE: this repository is distinct from the repo used for the ansible playbook infra code. The ansible code is only backed up to github and has nothing to do with gitlab.


Create a new main directory for all of the gitlab projects on the EC2 controller

EC2 controller will have https remote origin2 configured to github for backup of gitlab repo

EC2 controller will have ssh remote origin configured to gitlab for active project pipeline code development and testing 

Local Mac  VSCode will have https remote origin to github for local backup of the all of the github/gitlab development projects (do a git pull from github once each commit from EC2 is complete

The .gitignore will drop the .env file. The .env file although present on the EC2 controller repo is  not used and is not pushed to gitlab or github repos for security reasons. The .env variables are added to the project settings Variables in gitlab Admin Console itself.


### Detailed packet flow for the website application test

The packet flow for the deployment of the website application test via the gitlab .gitlab-ci.yml script is below.

This is more involved than the packet flow for gitlab admin indicated above because there is an additional deployed docker container for the application itself.


NOTE: for details of the iptables rules, specific ip addresses, etc see notes.txt
The network diagrams are in the Word doc.


NOTE:  no cloud services are required for deployment. The VPS is powerful enough to handle the deploymen to the app (in this case a simple caddy website instance) to a docker container on the VPS itself. 

A current docker container snapshot is below after the website application is deployed.
[root@vps scripts]# docker ps
CONTAINER ID   IMAGE                         COMMAND                  CREATED          STATUS                    PORTS                                                                         NAMES
d1d3d4c3cbc9   checkmk/check-mk-raw:latest   "/docker-entrypoint.…"   45 minutes ago   Up 45 minutes (healthy)   5000/tcp, 6557/tcp                                                            checkmk.linode.cloudnetworktesting.com
2ea4192c683a   namshi/smtp                   "/bin/entrypoint.sh …"   45 minutes ago   Up 45 minutes             25/tcp                                                                        mailer
eb9b06bd0033   nextcloud:28.0.9-apache       "/entrypoint.sh apac…"   45 minutes ago   Up 45 minutes             80/tcp                                                                        nextcloud.linode.cloudnetworktesting.com
b76750c470d5   pihole/pihole:latest          "/s6-init"               45 minutes ago   Up 45 minutes (healthy)   10.36.7.11:53->53/udp, 10.36.7.11:53->53/tcp, 10.36.7.11:67->67/udp, 80/tcp   pi-hole.linode.cloudnetworktesting.com
42f888c4a819   traefik:latest                "/entrypoint.sh --lo…"   46 minutes ago   Up 46 minutes             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp      traefik.linode.cloudnetworktesting.com
0fcdc6e454d4   caddy:latest                  "caddy run --config …"   18 hours ago     Up 18 hours               80/tcp, 443/tcp, 2019/tcp, 443/udp                                            linode.cloudnetworktesting.com
900d8851ca24   gitlab/gitlab-ce:latest       "/assets/wrapper"        23 hours ago     Up 23 hours (healthy)     80/tcp, 443/tcp, 0.0.0.0:44822->22/tcp, [::]:44822->22/tcp                    gitlab.linode.cloudnetworktesting.com



First, the  treafik whitelist for gitlab container (not the traefik container) must have the VPS public ip. This is because the gitlab runner sources packet from the VPS and during registration this needs to be allowed into the gitlab container. Otherwise, registration of the runner with the gitlab docker container server will fail. (see previous sectoin above)

More specifically these two lines in the docker-compose.yml file for the gitlab container in the ansible playbook:
      - "traefik.http.routers.${service_gitlab}.middlewares=${service_gitlab}-allowlist"
      - "traefik.http.middlewares.${service_gitlab}-allowlist.ipallowlist.sourcerange=${ip_allowlist}"

  The ip_allowlist is given in the .env file for the gitlab container ansible plabyook role
  This list has to include the public ip of the VPS.

  Once the basics are done, when the .gitlab-ci.yml file is added to the website project gitlab repo and pushed from EC2 controller to the gitlab repo, this will instigate the running (gitlab runner) of the script.

  For this very simple starter gitlab script, gitlab clones the project repo to the runner once it gets the .gitlab-cy.yml file. The runner then runs the docker-compose.yml commands in the source code

The following files are present for this simple project
-rw-rw-r-- 1 ubuntu ubuntu  274 Oct 10 00:43 .gitlab-ci.yml
-rw-rw-r-- 1 ubuntu ubuntu   52 Oct  9 23:18 Caddyfile
-rw-rw-r-- 1 ubuntu ubuntu  760 Oct 10 01:44 docker-compose.yml
-rw-rw-r-- 1 ubuntu ubuntu 1832 Oct  9 23:27 index.html

The index.html has the content to serve on the website 

The runner runs the docker-compose which starts a new docker container with the config files and content above on the VPS itself, as the deployment.


Note docker compose has all the info to program traefik as well and so it was successfully added.


Finally when a request https://linode.********.com is made in browser it hits the VPS public interface
The iptables have rules that docker added when the website container was added so the packet will be forwarded to traefik
First the iptables NAT engine PREROUTING table is hit and forwards to the DOCKER chain.
Here the port 80 traffic is nat'ed to the traefik private ip address on port 80

Once traefik has the packet it intercepts it and gets the Host header information, and based on its service forwarding rule it forwards the traffic to the docker container ip that is running the website docker container on port 80. This is because there is a traefik HTTP router and service that has been added for the Hostname linode.*****.com, the hostname for the website.

In looking at the traefik Web Admin console, the forwarding ip address in the service for this HTTP router is the ip address of the docker container of the website.....

The hostname of the webiste is in the .env file for this project, but as noted above these are not pushed to gitlab repo for security reasons and the ENV vars are all added via the gitlab Web Admin console.(manuallY)
These are the important variables that define the website and the container name, etc:

container_name="linode.**********.com"
hostname="linodel.**************.com"
docker_image="caddy"
docker_image_tag="latest"
service="cloudnetworktesting"

These are used in the traefik labels in the docker-compose.yml file noted above that creates this docker container website.

The website application traefik labels are below: (this is what creates the HTTP router and service noted above, that routes the traffic to the application docker container):
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=web"
      - "traefik.http.routers.${service}.rule=Host(`${hostname}`)"
      - "traefik.http.routers.${service}.tls.certresolver=letsencrypt"
      - "traefik.http.routers.${service}.entrypoints=https"
      - "traefik.http.services.${service}.loadbalancer.server.port=80"













## GITLAB PROJECT2: Python backup-checker application (source code) deployment with gitlab:

### Introduction:

This is a deployment of a basic python script that will check the status of the backups on the VPS. The ptyon script will be deployed in a python:3.10 docker container.  The gitlab pipeline is a basic multi-stage .gitlab-ci.yml pipeline configuration file to build, push to gitlab private container registry/repo, and deploy (docker run the image on gitlab runner) and then cleanup the images.  The images need to be cleaned up because the gitlab-runner is installed on the VPS as a service (registered to the gitlab docker container) and as such the runner is running the docker container basically on the VPS. So docker container instances and images are run and stored locally on the VPS.   The docker container instances expire after running the python script but the images will persist (docker image ls) on the VPS if not cleaned up. Note the cleanup is for docker images on the VPS and not the gitlab private container registry. The registry images will be left in the registry.  There needs to be an  allow_failure: true in the deploy stage in case the deploy stage detects a diff in current backup state and expected_backups state (This is a "failure") depending on the time of day that the pipeline is run. This is so that it proceeds to the cleanup stage regardless of the state of the deploy stage, so that the images do not build up on the VPS.




### mariadb connector vs. mysql-connector

There was a change in the post 8.0.29 mysql connector and it does not work well with the latest mariadb that we are using.  One can downgrade the mysql-connector-python to 8.0.29 but that is not a good permanent fix.

current requirements.txt:

mysql-connector-python==8.0.31 (leave this as is)



The changes required in main.py to work with the later version of maraidb that we are using are the following:

1. Change mysql.connector.connect to mariadb.connect. (function change from mysql.connector to mariadb)

2. add the import mariadb at the top

3. change the mysql.connector.Error to mariadb.Error class

4. Add mariadb-1.1.10 to the requirements.txt

These changes are all incorporated into the latest version of main.py for this project.


Re-run the pip install -r requirements.txt to install the new mariadb dependency if testing this out manually.


NOTE: the above works on the docker container running python-3.10 image fine so gitlab pipeline will run fine.

If you experiment running this python code on other systems like Windows or Ubuntu, the windows runs fine with python 3.9.13 installed (default windows VSCode python), and for Ubuntu 22 you will have to install 2 additional libs for the addition of the mariadb connector to the dependencies to install. (and with Ubuntu set your venv to python 3.9.13 or 3.10.15 which the docker container is running, or some version around these two).   I am not sure about Mac.



Run the following on the ubuntu to install the two libs:

sudo apt install libmariadb3 libmariadb-dev per mariadb documentation



NOTE: i did check a manual instance of docker python:3.10 that is used in the docker container of the gitlab pipeline and verified that those 2 libs above are already present. So the Dockerfile is good as is.

NOTE: this line: today = datetime.datetime.now().date().isoformat()

"today" may be different if you are testing the code across multiple OS like i was and the time zone may be different between the two. On my windows (PST) it was 1 day behind the VPS/docker date (UTC) so the VPS/docker date was showing the backups that were not complete for the day and thus a diff on the comparison, whereas windows showed no diff. This is working as designed and good.





### High level overview and further manual testing of the python main.py script:


Push .gitlab-ci.yml to the gitlab repo. This will notify runner to run the script.
It will build the docker image, push to private registry on gitlab, and then deploy the docker image from the private registry, i.e. the python docker container which will run the backup script main.py. it is running python image 3.10 (the docker python container) and from the manual testing from EC2 controller running python 3.9.13 and windows VPN client python 3.9.13 it should be ok. There were some lib dependency issues with the EC2 ubuntu controller but that was resolved by adding 2 libraries below. When testing on the Docker python 3.10 we may need to add such libararies when testing this on the docker container.  I did test a docker python:3.10 container and docker exec -it into the image and found that the lib files that were missing in EC2 testing were actually already present in the docker python:3.10 instance. So the python:3.10 docker container will be able to install all of the python requirements.txt dependencies (see Dockerfile) and run the script.

sudo apt install libmariadb3 libmariadb-dev (for ubuntu EC2 to run the requirements.txt and then run the python script)


Dockerfile:

FROM python:3.10
WORKDIR /backup-checker
COPY ./app /backup-checker
RUN pip install --no-cache-dir -r requirements.txt
CMD ["python", "main.py"] <<< it runs the main.py when it comes up (after deployed via the .gitlab-ci.yml)

My directory name is backup_checker_python, but only the app folder is going to be copied over not the complete directory.


From the Dockerfile comments:

##NOTES:
#copy over the entire app directory to backup-checker directory in the docker container.  I am not pushing over .env because
#that will be created on the fly in the .gitlab-ci.yml file
#The python:3.10 docker image has been tested to see that it contains the 2 libs below. These are required now that the 
#main.py python backup-checker is using the mariadb connector instead of mysql.connector function.
#To verify the docker python:3.10 image do the following:
#docker run -it --rm python:3.10 bash (step 1). This start a terminal session in the container itself.
#dpkg -l | grep libmariadb3
#dpkg -l | grep libmariadb-dev
#both of these show as installed in the docker python:3.10 docker container.
#With these libs, the pip install of dependencies requirements.txt which now has mariadb-1.1.10, should install without issues


Need to run the main.py from a docker container running python 3.10


I also tested the python script from windows VSCode as well. This requires that wiredguard VPN tunnel is set up because the script needs to contact mariadb on the VPS via the loopback interface of the VPS as well as contact zulip via the public VPS ip.  This works very well and pyenv is set up on the windows client and I was able to test the script and requirements with several different versions of python.  Of note, the requirements.txt will not install with python 3.12.5 (a latest version).  3.9.13 and 3.10s versions worked well. I also tried 3.8.19 and that worked as well as 3.11.9.  3.7.8 does not work with the maria-1.1.10 dependency which requires newer versions of python (apparently 3.8 and above). I don't know what exactly is causing the problem with the requirements.txt for the newer python version 3.12.5.

Finally, to simulate a gitlab-runner, also manually downloaded one of the docker images from gitlab-registry.linode.****.com. (to do this must log in with docker login gitlab-registry.linode.****.com and the gitlab root password). Once the image is downloaded can run the docker run command with a stripped down .env file (remove quotes with sed command):
Docker run --env-file=.env-docker gitlab-registry.linode.*********.com/dmastrop/backup_checker_python:18-84bc99b6
This will run the complete script locally along with the output
This was done on a windows client so VSCode had to be installed, along with docker desktop installed.  

additional notes on the local test above; Running the code locally can be done. Must authenticate to the gitlab-registry.linode.c****************.com with gitlab password then docker pull the image from the registry then strip down the .env file without quotes  (use a sed) and finally run the docker image. This must be done from a VPN wireguard client because the python container will connect to the maraidb on the VPS loopback as well as push notification to the zulip. So it needs to be able to connect to the loopback ip on the VPS for connection to the db_host mariadb as well as public VPS for push to zulip.


### .gitlab-ci.yml stages:

Need to create a Dockerfile to create the docker image.  The .gitlab-ci.yml will have a BUILD stage and will instruct the runner to create the docker image as a first stage. The runner needs the Dockerfile to do this.
NOTE that the runner gets the entire build directory with the .gitlab-ci.yml and Dockerfile and the app directory.
The runner copies the app directory into the docker image when it builds it as part of the BUILD stage of .gitlab-ciy.yml

The app directory has the main.py requirements.txt for dependencies and the expected-backups file for compare. The .env will be created on the fly by the gitlab .gitlab-cy.yml script with the maraidb backup password and the zulip_bot_api_key being storted locally on the gitlab project for security.  So it is ok that the .env is not being pushed to the gitlab and github repos.

The push stage will push the builds (2 of them, one with latest tag and one with COMMIT_SHORT_SHA tag) to the gitlab private container registry/repo on the gitlab container. This registry can be used by the gitlab-runner on the VPS to run the docker image on the VPS. The gitlab-runner runs on the VPS.  The .env file is created on the fly as indicated above in the deploy stage and then the runner does docker run --env-file .env $CI_REGISTRY_IMAGE:latest on the latest image.  This will store a copy of the image on the VPS (docker image ls) that should be removed by the cleanup stage regardless of whether or not the python main.py reveals a diff (failure) or no diff.


### Detailed packet flows and requirements to accomodate these packet flows from python app docker container to zulip lxc docker container and to query maraidb on the VPS through the VPS loopback interface.

There are several issues that have to be addressed with this python application being deployed to the VPS in a docker container.  The python script communicates with the mariadb to query backups and also needs to message the backup status to zulip, the lxc container on the VPS running the zulip messaging service.

Because the python app is contacting zulip from a container we had to add all docker subnets to the Traefik whitelist (ipallowlist) for the zulip container
ubuntu@ip-172-31-21-52:~/course11_devops_startup/ansible/traefik/traefik/configs$ cat zulip.yml (ansible traefik role)


Next add itables -nVL rule for docker to public VPS communication (zulip traffic/messaging). This is because the python is running in a docker container and it has to connect to the mariadb on the VPS as well as zulip.linode.****************.com to send message to zulip steam (ansible security role).  What actually happens is that the iptables nat table (iptables -t nat -nvL) is hit. The PREROUTING policy is hit first which instructs to go to the DOCKER CHAIN. In the DOCKER CHAIN docker to public VPS is considered docker bridge (br) traffic and it hits RETURN. So the folow goes back up to the PREROUTING policy where there is no match for the traffic. So the packet has to be proceesed by the iptables -nvL INPUT chain. There needs to be added rules from the docker subnets to the VPS public ip for 80 and 443 here. These need to be added.

Backup_script mariadb user needs to have docker subnets (python app) added to permissions because the python app in docker container will be querying the mariadb as this user for the backup list. (ansible maraidb role ansible mariadb/tasks/users.yml)


### more configuration:

New stream in zulip called “vps”
New bot cloudnetworktesting-bot with api key
Expected_backups file for comparison
Dockerfile to create the docker image that the python will run in (note: python:3.10 has all of the required dependencies to install the requirements.txt dependencies. Not true for EC2 ubnutu testing image I had to add 2 more dependencies to get the new mariadb stream connector to install as a dependency. Windows as testing image was ok.)

In gitlab added new project and varaiables (2 of them)) that will not be in the .env and created the .gitlab-ci.yml.
Note that the .env will be created on the fly in cleartext in the .gitlab-ci.yml except for 2 varaibles that are stored locally in the gitlab project (zulip bot api key and maraidb backup user password).  So the .env file does not need to be copied over into the container. The container will be run with the --env-file switch in the .gitlab-ci.yml after the pip install -r requrements.txt is run in the container.

The pipeline has 4 stages
BUILD, PUSH, DEPLOY and CLEANUP


Since the gitlab-runner is running on the VPS itself, and the gitlab runner is running the .gitlab-ci.yml both the VPS and the gitlab image container repository would have the image and the images would build up on the VPS server. This cleanup stage removes the image from the VPS.  One can see the image appear during the gitlab pipeline runs and with the cleanup stage it is removed.

The docker image is run on the runner but the runner is on the VPS so all these images build up on it and likewise you will see the docker ps -a expired docker python containers on the VPS as well.

[root@vps ~]# gitlab-runner status
Runtime platform                                    arch=amd64 os=linux pid=1478118 revision=b92ee590 version=17.4.0
gitlab-runner: Service is running
[root@vps ~]# systemctl status gitlab-runner
● gitlab-runner.service - GitLab Runner
     Loaded: loaded (/usr/lib/systemd/system/gitlab-runner.service; enabled; preset: disabled)
     Active: active (running) since Wed 2024-10-09 20:34:55 UTC; 1 week 1 day ago
 Invocation: 2add5938a2794d168f41c4959e33d11f
   Main PID: 566171 (gitlab-runner)
      Tasks: 12 (limit: 19174)
     Memory: 56.3M (peak: 127.6M swap: 6.5M swap peak: 6.6M zswap: 1.7M)
        CPU: 8min 8.134s
     CGroup: /system.slice/gitlab-runner.service
             └─566171 /usr/bin/gitlab-runner run --working-directory /var/lib/gitlab-runner --config /etc/gitlab-ru>


For the cleanup stage to run even if the deploy stage "fails", i.e. if there is a mismatch diff in the compare of current backup state and expected_backup state, insert this line into the deploy stage at the end of the stage:

 allow_failure: true




Finally, the gitlab pipeline scheduler will run this script at a predefined time each day so that we do not have to instigate a push to the gitab repo to run it each day.  The time should be set so that the day's scripts are complete so that a "success" on the compare (no diff) is achieved when it is run. For PST compared to the UST on the VPS this is run at around 11pm PST which is an hour or so after the next day's UTC scripts are run for all 7 services (including borg/bormatic)

The update below is sent to a zulip channel (stream) called vps and folder backups.
The email will be notified as well on zulip@linode.*******.com if the zulip stream is not monitored. There is a 2 minute default delay for sending the email for this purpose (I reduced this to 1 minute).  The .gitlab-ci.yml file uses a zulip bot to do this with a zulip_bot_api_key, etc.

everything is a-ok
+-----+------------+------------------------------------------+------------+----------+---------+
| id  |    date    |                  target                  | start_time | end_time | status  |
+-----+------------+------------------------------------------+------------+----------+---------+
| 139 | 2024-10-18 |   mail.linode.cloudnetworktesting.com    |  1:01:00   | 1:02:32  | success |
| 140 | 2024-10-18 |   zulip.linode.cloudnetworktesting.com   |  1:11:00   | 1:13:19  | success |
| 141 | 2024-10-18 |  pi-hole.linode.cloudnetworktesting.com  |  2:01:00   | 2:01:05  | success |
| 142 | 2024-10-18 | nextcloud.linode.cloudnetworktesting.com |  2:06:00   | 2:07:44  | success |
| 143 | 2024-10-18 |  checkmk.linode.cloudnetworktesting.com  |  2:11:00   | 2:11:07  | success |
| 144 | 2024-10-18 |  gitlab.linode.cloudnetworktesting.com   |  2:16:00   | 2:17:08  | success |
| 145 | 2024-10-18 |    vps.linode.cloudnetworktesting.com    |  5:01:01   | 5:01:47  | success |
+-----+------------+------------------------------------------+------------+----------+---------+




















## UPGRADE SCRIPTS (AUTO) for docker containers, iRedMail, Zulip and VPS host OS:

NOTE: Do not upgrade the nextcloud docker container. The nextcloud docker container was downgraded because latest version has a lot of issues.


### docker coontainer upgrade script modification:

For the docker containers the backup shell scripts will be modified to incorporate an upgrade. The image is :latest but that will not instigate an upgrade to the latest version on docker hub because the latest version will have the same tag :latest. The way to get an upgrade is to pull the latest image from docker hub and then do a docker-compose down and docker-compose up so that a new image is created from the latest image on docker hub. This can be done daily with the backup script.

Note we want to keep nextcloud at the version below due to problems with the newer versions of nextcloud.  28.0.9 seems to be fairly stable.


$ grep --include=.env docker_image_tag * -R

checkmk/checkmk/.env:docker_image_tag="latest"
gitlab/gitlab/.env:docker_image_tag="latest"
nextcloud/nextcloud/.env:docker_image_tag="28.0.9-apache"
pi-hole/pi-hole/.env:docker_image_tag="latest"
traefik/traefik/.env:docker_image_tag="latest"



Example shell script code to do this for nextcloud, pi-hole and checkmk (partial excerpt):
This is for checkmk:


#add the docker pull for the upgrade of the container
docker pull ${docker_image}:${docker_image_tag}
cd "${script_dir}" && \
docker-compose down && \
tar -czvf "${service_backup_directory}"/$(date "+%Y-%m-%dT%H-%M-%S").tar.gz "${service_directory}" && \
docker-compose up -d

end_time=$(date +"%T")




Example shell script code to do this for gitlab (parital excerpt):

docker pull ${docker_image}:${docker_image_tag}
docker exec -t "${container_name}" gitlab-backup create && \
docker exec -t "${container_name}" gitlab-ctl backup-etc --backup-path /secret/gitlab/backups/ && cd "{script_dir}" && docker-compose down && docker-compose up -d 
end_time=$(date +"%T")



Once the script backup.sh is modified with the above, rename backup-and-upgrade.sh

Then edit the tasks/main.yml to point to backup-and-upgrade.sh instead of backup.sh

For example,

  #edit the backup.sh to backup-and-upgrade.sh so that the docker container is upgraded     
- name: add backup checkmk script to crontab
  ansible.builtin.cron:
    name: backup checkmk
    minute: "11"
    hour: "2"
    user: root
    #job: "/root/services/checkmk/backup.sh > /dev/null 2>&1"
    job: "/root/services/checkmk/backup-and-upgrade.sh > /dev/null 2>&1" 


Run the ansible-playbook for the task so that the new file is pushed to the VPS

The above in main.yml configures the cronjob 
crontab -l will show the addition of a new cronjob.
The old one needs to be manually deleted on the VPS with "crontab -e" edit.

[root@vps ~]# crontab -l
#Ansible: mariadb dump all databases
01 3 * * * /root/scripts/mariadb-dump-all-databases.sh > /dev/null 2>&1
#Ansible: lxc mail.linode.cloudnetworktesting.com backup
01 1 * * * /root/scripts/lxc-backup-mail-linode-cloudnetworktesting-com.sh > /dev/null 2>&1
#Ansible: lxc zulip.linode.cloudnetworktesting.com backup
11 1 * * * /root/scripts/lxc-backup-zulip-linode-cloudnetworktesting-com.sh > /dev/null 2>&1
#Ansible: backup nextcloud
06 2 * * * /root/services/nextcloud/backup.sh > /dev/null 2>&1
#Ansible: nextcloud background jobs
*/5 * * * * docker exec --user www-data nextcloud.linode.cloudnetworktesting.com php cron.php
#Ansible: borgmatic backup
01 5 * * * /usr/bin/borgmatic --syslog-verbosity 1 > /dev/null 2>&1
#Ansible: backup-and-upgrade pi-hole edit
01 2 * * * /root/services/pi-hole/backup-and-upgrade.sh > /dev/null 2>&1
#Ansible: backup and upgrade checkmk
11 2 * * * /root/services/checkmk/backup-and-upgrade.sh > /dev/null 2>&1
#Ansible: backup-and-upgrade gitlab
16 2 * * * /root/services/gitlab/backup-and-upgrade.sh > /dev/null 2>&1








Traefik should be upgraded manually because it is a single point of failure.
To upgrade traefik manually go into the /root/services/traefik directory

root@vps services]# cd traefik/
[root@vps traefik]# ls
acme.json  configs  docker-compose.yml
[root@vps traefik]# pwd
/root/services/traefik


In /root/services/traefik do 
docker pull traefik:latest


then docker-compose down && docker-compose up -d


It will take a day or so for the older backup files in /mnt/storage to flush out so that there are not 2 listings of each of the docker container backups above in the main.py backup script output sent to zulip and email.





### LXC containers:

The iRedMail and Zulip containers are basically upgraded manually by ssh into the LXC container.  See the Word doc for the links to the docs. The docs should be folllowed to do this.


After the upgrade steps are done the LXC container should be rebooted with systemctl reboot


### Archlinux host OS VPS:

Use pacman to do this.

pacman -Syu


Do it relatively often about 1x/week.

It is always a good idea to do a systemctl reboot especially if the kernel is upgraded




## Docker prune

The python backup-checker will create an expired docker container each time it is run. Since it is being run on a scheduler several times a day now, these exited containers will build up rapidly.
For example, 
[root@vps ~]# docker ps -a | grep  main.py
cd4a66bf5112   ca738f9ff232                  "python main.py"         14 hours ago   Exited (0) 14 hours ago                                                                                 compassionate_dirac
9dc94c87ce1d   e25344175cc7                  "python main.py"         17 hours ago   Exited (1) 17 hours ago                                                                                 vigorous_spence
772cef92cd00   1325bd573d56                  "python main.py"         24 hours ago   Exited (0) 24 hours ago                                                                                 magical_visvesvaraya
7b46f2a94149   bdb06a166c3c                  "python main.py"         38 hours ago   Exited (0) 38 hours ago                                                                                 trusting_lederberg
f0679fa6c43f   515ccbf7619f                  "python main.py"         41 hours ago   Exited (1) 41 hours ago         


### First clean up the existing setup

docker ps -a | grep main.py | awk ‘{print $1}’ | xargs docker rm


This will grep out main.py from the docker ps -a and then feed in the instance id into the docker rm command to remove all of the exited instances


### pipeline cleanup:

Next, modify the backup-checker .gitlab-ci.yml file with the change below adding the rm flag to the docker command in the push stage:

#add the --rm flag to the docker run command in the script section below so that the old docker ps -a 
#exited containers do not build up on the VPS 
deploy:
  stage: deploy
  before_script:
.........

  script:
    - docker run --rm --env-file .env $CI_REGISTRY_IMAGE:latest


This will remove the exited containers immediately after the pipeline runs.

### Next in docker/tasks/main.yml

#Integrate a docker system prune into the cronjob crontab -l so that it is run every 24 hours. #The first cleanup will be very large
- include_tasks: prune.yml


### prune.yml file:
This will be added to the crontab


- name: add docker prune to crontab
  ansible.builtin.cron:
    name: docker prune
    minute: "30"
    hour: "5"
    user: root
    job: "docker system prune --all --force > /dev/null 2>&1"
#this will be run after all of the other cronjobs

Move this to an hour after the last backup is performed.

This is a complete docker system prune.

crontab is now added. See below at the bottom

[root@vps ~]# crontab -l
#Ansible: mariadb dump all databases
01 3 * * * /root/scripts/mariadb-dump-all-databases.sh > /dev/null 2>&1
#Ansible: lxc mail.linode.cloudnetworktesting.com backup
01 1 * * * /root/scripts/lxc-backup-mail-linode-cloudnetworktesting-com.sh > /dev/null 2>&1
#Ansible: lxc zulip.linode.cloudnetworktesting.com backup
11 1 * * * /root/scripts/lxc-backup-zulip-linode-cloudnetworktesting-com.sh > /dev/null 2>&1
#Ansible: backup nextcloud
06 2 * * * /root/services/nextcloud/backup.sh > /dev/null 2>&1
#Ansible: nextcloud background jobs
*/5 * * * * docker exec --user www-data nextcloud.linode.cloudnetworktesting.com php cron.php
#Ansible: borgmatic backup
01 5 * * * /usr/bin/borgmatic --syslog-verbosity 1 > /dev/null 2>&1
#Ansible: backup-and-upgrade pi-hole edit
01 2 * * * /root/services/pi-hole/backup-and-upgrade.sh > /dev/null 2>&1
#Ansible: backup and upgrade checkmk
11 2 * * * /root/services/checkmk/backup-and-upgrade.sh > /dev/null 2>&1
#Ansible: backup-and-upgrade gitlab
16 2 * * * /root/services/gitlab/backup-and-upgrade.sh > /dev/null 2>&1
#Ansible: docker prune
30 5 * * * docker system prune --all --force > /dev/null 2>&1











## GITLAB PROJECT3: extending this gitlab VPS setup to deploy a weather app (source code) to AWS K8s EKS cluster using helm:

The VPS gitlab environment will run the pipeline against an EKS cluster in AWS using helm.
This will be an extension to a more complete CI/CD pipeline from a software development lifecycle perspective, involving multiple stages like BUILD, PROMOTE, DEPLOY, etc......










## NOTES on cerbot TLS certs and letsencrypt certs on Traefik reverse proxy:

Traefik HTTPS letsencrypt certs:
Note that for native traffic (mail protocol) the certbot generates the TLS cert
The traefik is only used for HTTP/HTTPS (Web Admin stuff)
The certs are all in the /root/services/traefik/************ file
These certs are public and private key pairs

-Private key for letsencrypt itself
-Traefik keys
-Zulip keys
-iRedMail keys (only for HTTPS not for mail traffic, that uses certbot)
-pi-hole keys
-nextcloud keys
-etc.....













About

c.11: this is an implementation of a full devops pipeline ecosystem w/ gitlab w/o Cloud provider, using third party tools & a linode VPS and ansible to install services as docker and lxc containers. A fully secure devops environment. Ansible is main tool for provisioning the VPS. See README for high level and notes.txt for implementation details

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published