Skip to content

Update page about setting up a private Stratum 1 #157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0ecd2bb
update page for setting up a private stratum 1
bedroge Feb 15, 2024
d2cbf84
fix typo in bandwidth
bedroge Feb 15, 2024
12900ff
add sentence about SSH keys and sudo
bedroge Feb 15, 2024
6bbc5a0
make site-specific vars file optional
bedroge Feb 16, 2024
0ff336a
add warning about IPS
bedroge Feb 16, 2024
f830adb
added sentence about downside of https
bedroge Feb 16, 2024
b018fb4
change headers of subsections
bedroge Feb 16, 2024
a4851b0
add recommendation for having squid proxies
bedroge Apr 12, 2024
d555b4b
fix typo in mechanisms
bedroge Apr 12, 2024
2b32fa9
reword sentence about replicating from stratum 0 a bit
bedroge Apr 12, 2024
ba51497
discourage https
bedroge Apr 12, 2024
40668b1
extend paragraph about geo api, instructions for disabling it on the …
bedroge Apr 12, 2024
7c6742c
remove note about Squid proxy on Stratum 1, as it's now disabled by d…
bedroge Apr 12, 2024
0fac17c
remove cache hit example
bedroge Apr 12, 2024
93b1c34
use eessi.io instead of eessi-hpc.org
bedroge Apr 12, 2024
46a2f2f
fix typo in however
bedroge Apr 12, 2024
59e2f8d
remove -p ./roles in ansible-galaxy command
bedroge Jun 4, 2024
9825341
add link to stratum 1 page
bedroge Jun 4, 2024
c6adda4
add section about proxy configuration
bedroge Jun 4, 2024
5ac0a48
add section about configuring an additional stratum 1
bedroge Jun 4, 2024
52bbfd7
move client config part to native installation page
bedroge Jun 4, 2024
d8bb91e
fix link
bedroge Jun 4, 2024
47e3824
fix link
bedroge Jun 4, 2024
8c8ce18
correct paragraph about /srv
bedroge Jun 4, 2024
020139c
remove instructions for mounting an additional file system
bedroge Jun 4, 2024
9e97d4f
remove note, rearrange the sections, add section for larger systems
bedroge Jun 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 70 additions & 111 deletions docs/filesystem_layer/stratum1.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,66 @@
# Setting up a Stratum 1

Setting up a Stratum 1 involves the following steps:

- set up the Stratum 1, preferably by running the Ansible playbook that we provide;
- request a Stratum 0 firewall exception for your Stratum 1 server;
- request a `<your site>.stratum1.cvmfs.eessi-infra.org` DNS entry;
- open a pull request to include the URL to your Stratum 1 in the EESSI configuration.

The last two steps can be skipped if you want to host a "private" Stratum 1 for your site.

The EESSI project provides a number of geographically distributed public Stratum 1 servers that you can use to make EESSI available on your machine(s).
It is always recommended to have a local caching layer consisting of a few Squid proxies.
If you want to be even better protected against network outages and increase the bandwidth between your cluster nodes and the Stratum 1 servers,
you could also consider setting up a local (private) Stratum 1 server that replicates the EESSI CVMFS repository.
This guarantees that you always have a full and up-to-date copy of the entire stack available in your local network.

## Requirements for a Stratum 1

The main requirements for a Stratum 1 server are a good network connection to the clients it is going to serve,
and sufficient disk space. For the EESSI repository, a few hundred gigabytes should suffice, but for production
environments at least 1 TB would be recommended.
and sufficient disk space. As the EESSI repository is constantly growing, make sure that the disk space can easily be extended if necessary.
Currently, we recommend to have at least 1 TB available.

In terms of cores and memory, a machine with just a few (~4) cores and 4-8 GB of memory should suffice.

Various Linux distributions are supported, but we recommend one based on RHEL 7 or 8.
Various Linux distributions are supported, but we recommend one based on RHEL 8 or 9.

Finally, make sure that ports 80 (for the Apache web server) and 8000 are open.
Finally, make sure that ports 80 and 8000 are open to clients.


## Step 1: set up the Stratum 1
## Configure the Stratum 1

The recommended way for setting up an EESSI Stratum 1 is by running the Ansible playbook `stratum1.yml`
from the [filesystem-layer repository on GitHub](https://github.com/EESSI/filesystem-layer).
Stratum 1 servers have to synchronize the contents of their CVMFS repositories regularly, and usually they replicate from a CVMFS Stratum 0 server.
In order to ensure the stability and security of the EESSI Stratum 0 server, it has a strict firewall, and only the EESSI-maintained public Stratum 1 servers are allowed to replicate from it.
However, EESSI provides a synchronisation server that can be used for setting up private Stratum 1 replica servers, and this is available at `http://aws-eu-west-s1-sync.eessi.science`.

!!! warn Potential issues with intrusion prevention systems
In the past we have seen a few occurrences of data transfer issues when files were being pulled in by or from a Stratum 1 server.
In such cases the `cvmfs_server snapshot` command, used for synchronizing the Stratum 1, may break with errors like `failed to download <URL to file>`.
Trying to manually download the mentioned file with `curl` will also not work, and result in errors like:
```
curl: (56) Recv failure: Connection reset by peer
```
In all cases this was due to an intrusion prevention system scanning the associated network, and hence scanning all files going in or out of the Stratum 1.
Though it was a false-positive in all cases, this breaks the synchronization procedure of your Stratum 1.
If this is the case, you can try switching to HTTPS by using `https://aws-eu-west-s1-sync.eessi.science` for synchronizing your Stratum 1.
Even though there is no advantage for CVMFS itself in using HTTPS (it has built-in mechanisms for ensuring the integrity of the data),
this will prevent the described issues, as the intrusion prevention system will not be able to inspect the encrypted data.
However, not only does HTTPS introduce some overhead due to the encryption/decryption, it also makes caching in forward proxies impossible.
Therefore, it is strongly discouraged to use HTTPS as default.

Installing a Stratum 1 requires a GEO API license key, which will be used to find the (geographically) closest Stratum 1 server for your client and proxies.
More information on how to (freely) obtain this key is available in the CVMFS documentation: https://cvmfs.readthedocs.io/en/stable/cpt-replica.html#geo-api-setup.
### Manual configuration

You can put your license key in the local configuration file `inventory/local_site_specific_vars.yml`.
In order to set up a Stratum 1 manually, you can make use of the instructions in the [Private Stratum 1 replica server](https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/access/stratum1/)
section of the MultiXscale tutorial ["Best Practices for CernVM-FS in HPC"](https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/).

Furthermore, the Stratum 1 runs a Squid server. The template configuration file can be found at `templates/eessi_stratum1_squid.conf.j2`.
If you want to customize it, for instance for limiting the access to the Stratum 1, you can make your own version of this template file
and point to it by setting `local_stratum1_cvmfs_squid_conf_src` in `inventory/local_site_specific_vars.yml`.
See the comments in the example file for more details.
### Configuration using Ansible

Start by installing Ansible:
The recommended way for setting up an EESSI Stratum 1 is by running the Ansible playbook `stratum1.yml`
from the [filesystem-layer repository on GitHub](https://github.com/EESSI/filesystem-layer).
For the commands in this section, we are assuming that you cloned this repository, and your working directory is `filesystem-layer`.

!!! note GEO API
Installing a Stratum 1 usually requires a GEO API license key, which will be used to find the (geographically) closest Stratum 1 server for your client and proxies.
However, for a private Stratum 1 this can be skipped, and you can disable the use of the GEO API in the configuration of your clients by setting `CVMFS_USE_GEOAPI=no`.
In this case, they will just connect to your local Stratum 1 by default.

If you do want to set up the GEO API, you can find more information on how to (freely) obtain this key in the CVMFS documentation: https://cvmfs.readthedocs.io/en/stable/cpt-replica.html#geo-api-setup.

You can put your license key in the local configuration file `inventory/local_site_specific_vars.yml`.

Start by installing Ansible, e.g.:

```bash
sudo yum install -y ansible
Expand All @@ -47,128 +69,65 @@ sudo yum install -y ansible
Then install Ansible roles for EESSI:

```bash
ansible-galaxy role install -r requirements.yml -p ./roles --force
ansible-galaxy role install -r ./requirements.yml --force
```

Make sure you have enough space in `/srv` (on the Stratum 1) since the snapshot of the Stratum 0
will end up there by default. To alter the directory where the snapshot gets copied to you can add
this variable in `inventory/host_vars/<url-or-ip-to-your-stratum1>`:

Make sure you have enough space in `/srv` on the Stratum 1, since the snapshots of the repositories
will end up there by default. To alter the directory where the snapshots get stored you can manually
create a symlink before running the playbook:
```bash
cvmfs_srv_mount: /srv
sudo ln -s /lots/of/space/cvmfs /srv/cvmfs
```

Make sure that you have added the hostname or IP address of your server to the
`inventory/hosts` file. Finally, install the Stratum 1 using one of the two following options.
Also make sure that you have added the hostname or IP address of your server to the
`inventory/hosts` file, that you are able to log in to the server from the machine that is going to run the playbook
(preferably using an SSH key), and that you can use `sudo`.

Option 1:
Finally, install the Stratum 1 using:

``` bash
# -b to run as root, optionally use -K if a sudo password is required
ansible-playbook -b [-K] -e @inventory/local_site_specific_vars.yml stratum1.yml
# -b to run as root, optionally use -K if a sudo password is required, and optionally include your site-specific variables
ansible-playbook -b [-K] [-e @inventory/local_site_specific_vars.yml] stratum1.yml
```

Option2:

Create a ssh key pair and make sure the `ansible-host-keys.pub` is in the
`$HOME/.ssh/authorized_keys` file on your Stratum 1 server.

```bash
ssh-keygen -b 2048 -t rsa -f ~/.ssh/ansible-host-keys -q -N ""
```

Then run the playbook:

```bash
ansible-playbook -b --private-key ~/.ssh/ansible-host-keys -e @inventory/local_site_specific_vars.yml stratum1.yml
```

Running the playbook will automatically make replicas of all the repositories defined in `group_vars/all.yml`.


## Step 2: request a firewall exception

(This step is not implemented yet and can be skipped)

You can request a firewall exception rule to be added for your Stratum 1 server by
[opening an issue on the GitHub page of the filesystem layer repository](https://github.com/EESSI/filesystem-layer/issues/new).
### Verification of the Stratum 1 using `curl`

Make sure to include the IP address of your server.

## Step 3: Verification of the Stratum 1

When the playbook has finished your Stratum 1 should be ready. In order to test your Stratum 1, even
without a client installed, you can use `curl`.
When the playbook has finished, your Stratum 1 should be ready. In order to test your Stratum 1,
even without a client installed, you can use `curl`:

```bash
curl --head http://<url-or-ip-to-your-stratum1>/cvmfs/software.eessi.io/.cvmfspublished
```
This should return:
This should return something like:

```bash
HTTP/1.1 200 OK
...
X-Cache: MISS from <url-or-ip-to-your-stratum1>
```

The second time you run it, you should get a cache hit:

```bash
X-Cache: HIT from <url-or-ip-to-your-stratum1>

Content-Type: application/x-cvmfs
```

Example with the Norwegian Stratum 1:
Example with the EESSI Stratum 1 running in AWS:

```bash
curl --head http://bgo-no.stratum1.cvmfs.eessi-infra.org/cvmfs/software.eessi.io/.cvmfspublished
curl --head http://aws-eu-central-s1.eessi.science/cvmfs/software.eessi.io/.cvmfspublished
```

You can also test access to your Stratum 1 from a client, for which you will have to install the CVMFS
[client](https://github.com/EESSI/filesystem-layer#clients).

Then run the following command to add your newly created Stratum 1 to the existing list of EESSI Stratum 1 servers by creating a local CVMFS configuration file:
### Verification of the Stratum 1 using a CVMFS client

```bash
echo 'CVMFS_SERVER_URL="http://<url-or-ip-to-your-stratum1>/cvmfs/@fqrn@;$CVMFS_SERVER_URL"' | sudo tee -a /etc/cvmfs/domain.d/eessi-hpc.org.local
```
You can, of course, also test access to your Stratum 1 from a client.
This requires you to install a CernVM-FS client and add the Stratum 1 to the client configuration;
this is explained in more detail on the [native installation page](../getting_access/native_installation.md).

If this is the first time you set up the client you now run:

```bash
sudo cvmfs_config setup
```

If you already had configured the client before, you can simply reload the config:

```bash
sudo cvmfs_config reload -c software.eessi.io
```

Finally, verify that the client connects to your new Stratum 1 by running:
Then verify that the client connects to your new Stratum 1 by running:

```bash
cvmfs_config stat -v software.eessi.io
```

Assuming that your new Stratum 1 is the geographically closest one to your client, this should return:
Assuming that your new Stratum 1 is working properly, this should return something like:

```bash
Connection: http://<url-or-ip-to-your-stratum1>/cvmfs/software.eessi.io through proxy DIRECT (online)
```


## Step 4: request an EESSI DNS name

In order to keep the configuration clean and easy, all the EESSI Stratum 1 servers have a DNS name
`<your site>.stratum1.cvmfs.eessi-infra.org`, where `<your site>` is often a short name or
abbreviation followed by the country code (e.g. `rug-nl` or `bgo-no`). You can request this for
your Stratum 1 by mentioning this in the issue that you created in Step 2, or by opening another
issue.

## Step 5: include your Stratum 1 in the EESSI configuration

If you want to include your Stratum 1 in the EESSI configuration, i.e. allow any (nearby) client to be able to use it,
you can open a pull request with updated configuration files. You will only have to add the URL to your Stratum 1 to the
`urls` list of the `eessi_cvmfs_server_urls` variable in the
[`all.yml` file](https://github.com/EESSI/filesystem-layer/blob/main/inventory/group_vars/all.yml).
58 changes: 52 additions & 6 deletions docs/getting_access/native_installation.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Native installation

## Installation for single clients

Setting up native access to EESSI, that is a system-wide deployment that does not require workarounds like
[using a container](eessi_container.md), requires the installation and configuration of [CernVM-FS](https://cernvm.cern.ch/fs).

Expand Down Expand Up @@ -62,14 +64,58 @@ The good news is that all of this only requires a handful commands :astonished:
sudo cvmfs_config setup
```

## Installation for larger systems (e.g. clusters)

When using CernVM-FS on a larger number of local clients, e.g. on a HPC cluster or set of workstations,
it is very strongly recommended to at least set up some Squid proxies close to your clients.
These Squid proxies will be used to cache content that was recently accessed by your clients,
which reduces the load on the Stratum 1 servers and reduces the latency for your clients.
As a rule of thumb, you should use about one proxy per 500 clients, and have a minimum of two.
Instructions for setting up a Squid proxy can be found in the [CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-squid.html) and
in the [CernVM-FS tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/03_stratum1_proxies/#32-setting-up-a-proxy).

Additionally, setting up a private Stratum 1, which will make a full copy of the repository,
can be beneficial to improve the latency and bandwidth even further, and to be better protected against network outages.
Instructions for setting up your own EESSI Stratum 1 can be found in [setting up your own CernVM-FS Stratum 1 mirror server](../filesystem_layer/stratum1.md).

### Configuring your client to use a Squid proxy

If you have set up one or more Squid proxies, you will have to add them to your CernVM-FS client configuration.
This can be done by removing `CVMFS_CLIENT_PROFILE="single"` from `/etc/cvmfs/default.local`, and add the following line:

```
CVMFS_HTTP_PROXY="http://ip-of-your-1st-proxy:port|http://ip-of-your-2nd-proxy:port"
```

In this case, both proxies are equally preferable.
More advanced use cases can be found in [the CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#proxy-list-examples).

### Configuring your client to use a private Stratum 1 mirror server

If you have set up your own Stratum 1 mirror server that replicates the EESSI CernVM-FS repositories,
you can instruct your CernVM-FS client(s) to use it by prepending your newly created Stratum 1 to the existing list of EESSI Stratum 1 servers by creating a local CVMFS configuration file for the EESSI domain:

```bash
echo 'CVMFS_SERVER_URL="http://<url-or-ip-to-your-stratum1>/cvmfs/@fqrn@;$CVMFS_SERVER_URL"' | sudo tee -a /etc/cvmfs/domain.d/eessi.io.local
```

!!! note
By prepending your new Stratum 1 to the list of existing Stratum 1 servers, your clients should by default use the private Stratum 1.
In case of downtime of your private Stratum 1, they will also still be able to make use of the public EESSI Stratum 1 servers.


### Applying changes in the CernVM-FS client configuration files

After you have made any changes to the CernVM-FS client configuration, you will have to apply them.
If this is the first time you set up the client, you can simply run:

:point_up: The commands above only cover the basic installation of EESSI.
```bash
sudo cvmfs_config setup
```

This is good enough for an individual client, or for testing purposes,
but for a production-quality setup you should also set up a Squid proxy cache.
If you already had configured the client before, you can reload the configuration for the EESSI repository (or, similarly, for any other repository) using:

For large-scale systems, like an HPC cluster, you should also consider setting up your own CernVM-FS Stratum-1 mirror server.
```bash
sudo cvmfs_config reload -c software.eessi.io
```

For more details on this, please refer to the
[*Stratum 1 and proxies section* of the CernVM-FS tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/03_stratum1_proxies/).
Loading