Skip to content

technical documentation

R. P. Taylor edited this page Aug 30, 2023 · 9 revisions

Table of Contents

Introduction

This page describes useful commands and technical information about CVMFS.

CVMFS Client

Debugging

  • See the debugging guide.
  • To see debug information you can set CVMFS_DEBUGLOG=/tmp/cvmfs.log for a short time (it produces large log volumes).
    • You will need to do a reload to enable or disable this configuration setting.
    • Important: make sure the CVMFS user can write to this file, wherever it is, as well as ${CVMFS_DEBUGLOG}.cachemgr .
  • cvmfs_config bugreport # capture a tarball report to send to CVMFS developers for debugging.

General utilities

This is an incomplete list of useful commands. For more details see the documentation.

cvmfs_config

Some of these commands are informational and can be run without root privilege.

  • cvmfs_config [-j4] fsck # check data files and catalog in cache (using 4 threads)
  • cvmfs_config showconfig # show configuration
  • cvmfs_config chksetup # validate the configuration. Checks for syntax errors, tests network connections for all proxy/server permutations, etc.
  • cvmfs_config probe # check and ensure that repos are mounted. Basically like doing ls in each repo
  • cvmfs_config status # show mounted repos and pids
  • cvmfs_config stat -v # show usage information (error counters, etc.)
  • cvmfs_config wipecache # clear the CVMFS cache. Useful for testing to ensure that new content is being retrieved from the server instead of from cache.
  • cvmfs_config umount # try to unmount repositories (if there are open file descriptors it will fail and show them)
  • cvmfs_config [-c] reload # apply configuration changes

cvmfs_talk

These commands provide more lower-level control and require root privilege. Some of this functionality is provided in a more convenient or high-level way by the cvmfs_config utility. The cvmfs_talk utility interacts with running CVMFS processes (i.e. active repositories) individually, so you may need to first run cvmfs_config probe in order to start the processes. You can interact with all repositories, or a particular one specified with -i.

  • cvmfs_talk # show help info
  • cvmfs_talk hotpatch history #show the history of loaded CVMFS FUSE modules
  • cvmfs_talk parameters # show currently loaded parameters.
  • cvmfs_talk internal affairs # show performance counters, etc.
  • cvmfs_talk cache list # show what items are in the cache
  • cvmfs_talk cache size # show cache usage
  • cvmfs_talk cache instance # show which cache manager is used
  • cvmfs_talk pid cachemgr # show pid of the cache manager
  • cvmfs_talk cleanup 10000 # removes the oldest files until the used space is below 10000 MB.
  • cvmfs_talk [host|proxy] info # show active host or proxy. Should work even if CVMFS is stuck
  • cvmfs_talk proxy set "http://squid1.example.org:3128;http://squid2.example.org:3128" # transiently set proxy list
  • cvmfs_talk proxy group switch # switch to the next proxy server in the list.
  • cvmfs_talk host switch # switch to using the next stratum server in the list.
  • cvmfs_talk host probe <geo> manually order stratum 1 servers using RTT (or geo-api)

Extended attributes

Mounted repositories have extended attributes which can be queried to show some of the same information provided by the cvmfs_config and cvmfs_talk utilities, as well as other information such as the public keys loaded, total size of the repository in several metrics, repository meta-information, etc.

  • getfattr -d /cvmfs/soft.computecanada.ca/ # show all extended attributes
    • This requires setting CVMFS_MAGIC_XATTRS_VISIBILITY="always"
  • getfattr -n user.pubkeys /cvmfs/soft.computecanada.ca/ # get all public keys loaded by client (not repo-specific)
  • getfattr -n user.revision /cvmfs/soft.computecanada.ca/ # get current catalog revision number for repo
  • getfattr -n user.inode_max /cvmfs/soft.computecanada.ca # show the number of inodes used by a repository
  • getfattr -n user.catalog_counters /cvmfs/soft.computecanada.ca/new_repository # show various details
    • catalog_hash provides the hash of the catalog that this file belongs to

Using static mounts instead of autofs

Instead of automounting repositories with autofs, it is possible to statically mount CVMFS repositories. However, this configuration is less frequently used and may not be as resilient when it comes to automatically recovering from issues. It is not possible to mount the same repository statically and with autofs at the same time.

  • yum erase cvmfs-auto-setup
  • Remove the CVMFS configuration from /etc/auto.master .
  • Stop and disable autofs.
  • Ensure that all CVMFS repositories are unmounted (cvmfs_config umount)
  • See https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#mounting for examples to configure /etc/fstab and how to ensure a config repo is mounted first.
Note: the above method can be used to mount a repository in a non-standard location, for example to mount soft-dev.computecanada.ca as soft.computecanada.ca for testing purposes, but this can be problematic. A cleaner alternative is to use proot:
 proot -b /cvmfs/soft-dev.computecanada.ca:/cvmfs/soft.computecanada.ca bash

CVMFS Stratum 0 client

A stratum 0 server has a specially configured CVMFS client. You can perform operations on it via the /var/spool/cvmfs/<repo_name>/cvmfs_io socket, for example:

  cvmfs_talk -p /var/spool/cvmfs/<repo_name>/cvmfs_io cleanup 0 # clean up cache

CVMFS Server

Verify chunk hashes

To check if a data chunk on a stratum server has been corrupted, you can compare the hash of the content of the compressed data chunk with the name of the chunk to see if they match. For example, for SHAKE128:

$ openssl dgst -shake128 /srv/cvmfs/soft-dev.computecanada.ca/data/c2/0145206b838b4ee1db9d75124b8b26147686c7-shake128
SHAKE128(/srv/cvmfs/soft-dev.computecanada.ca/data/c2/0145206b838b4ee1db9d75124b8b26147686c7-shake128)= c20145206b838b4ee1db9d75124b8b26
Note that the SHAKE128 digest requires a recent version of openssl. To see the list of available digest (aka hash) functions, do openssl list --digest-commands. On even newer openssl versions, the -xoflen option can be used to specify the full 40-character length of the hash used by CVMFS.

File-Object Mapping

File to Object

For files in the repository which are small enough to be encoded by a single cache chunk, you can simply use getfattr on a client to find the hash of that chunk:

$ getfattr -n user.hash /cvmfs/soft.computecanada.ca/new_repository
getfattr: Removing leading '/' from absolute path names
# file: cvmfs/soft.computecanada.ca/new_repository
user.hash="af7468491736a43246605c49c86fb6a5181e6cab-shake128"

Having the hash, you can look for the corresponding chunk in the cache. Note that the first two characters of the hash represent a subdirectory.

$ sudo cat /var/lib/cvmfs/shared/af/7468491736a43246605c49c86fb6a5181e6cab-shake128
New CernVM-FS repository for soft.computecanada.ca

For larger files that are chunked, you can find the list of chunks:

$ getfattr -n user.chunk_list /cvmfs/soft.computecanada.ca/gentoo/2020/stage3.log
getfattr: Removing leading '/' from absolute path names
# file: cvmfs/soft.computecanada.ca/gentoo/2020/stage3.log
user.chunk_list="hash,offset,size\012b6f8cf70ee5c1bf7432e94f6b651ade06ae0933a-shake128,0,16777216\0125aa3648656fbf7429d82469a5d437e576289c1e3-shake128,16777216,4573513\0126946387eb8a24d33f98dc7c76fd32eb377c6b440-shake128,21350729,4884937\012b5304bbb4f5016a04335c20ca86c18c19c3fee32-shake128,26235666,4618696\012c08a7bfd0b68706c3766dba5cefa942fe04faa67-shake128,30854362,4727054\012c49e27c8cebbd791687cc2076849ddb9802bf73b-shake128,35581416,4392511\0124b5edba9940b381e540610bea234b126a36f5da4-shake128,39973927,4199210\012cf8f16d10e4c25669438b6ebde2108fdd4ce3168-shake128,44173137,5143967\0128caf9b31d56a8f8a735ce3ba5b45bc4952abfcf5-shake128,49317104,7461382\012be6e096e146909837995c01023d268eb65324517-shake128,56778486,2927198\012"

Object to File

Due to the nature of CAS, a reverse mapping is subject to certain limitations (also note CVM-1628). For example, the lookup will not work if the object is not present in the client cache (and there is no way to cache it if you don't know the corresponding file), the mapping may be unknown by the client if it had to rebuild the cache database from disk (e.g. in case of a crash or sudden shutdown), and there is no distinction between different repositories when a shared cache is used.

Find the cache database file, usually located at /var/lib/cvmfs/shared/cachedb, and copy it elsewhere so that your lookup and CVMFS client operations do not interfere with each other. Then do:

$ sqlite3 cachedb "select * from cache_catalog where sha1='af7468491736a43246605c49c86fb6a5181e6cab-shake128';"
af7468491736a43246605c49c86fb6a5181e6cab-shake128|51|358326496|/new_repository|0|0

Catalog management

This documentation provides information on recommendations and management of nested catalogs. It is the responsibility of the repository publishers to manage the catalog placement according to their knowledge of the contents and expected usage of the repository. Whether the catalog placement is managed automatically or by hand, all catalogs should be structured so that each catalog file is a reasonable size. A rule of thumb is that each catalog should contain between 1,000 and 200,000 entries. In particular, catalogs that exceed this limit (corresponding to a catalog file size of approximately 40 MB) should be avoided as they make lookups in the associated directories sluggish, and can cause congestion and performance problems on stratum and proxy servers. This is particularly important for the root catalog, which should preferably be about 1 MB to 10 MB in size. On the other hand, catalogs which are significantly smaller than the minimum recommendation of 1,000 are also undesirable as they constitute unnecessary overhead.

  • On a CVMFS publisher or stratum server, use cvmfs_server list-catalogs -s -h -e to show the size, hash, and number of entries for all catalogs in a repository.

Check repository root catalog size

Here is a useful script for remotely scanning stratum 1 servers and checking the size of the root catalog (but not any of the nested catalogs) on all hosted repositories: cvmfs-rootcatalog-mon.sh.

Inspect catalog file

To inspect a catalog file, first decompress it, then you can open it with sqlite3. For example, to find where in the repository file tree a catalog lies:

$ sqlite3 0178fef63c12d89bf856918f2748ac30ecb879C-extracted "select * from properties where key='root_prefix';"
root_prefix|/nix/store/w5lq9as00v9nsmhqdir2p14l4m2s2g3x-sourcecodepro-2.6

Geolocation

Stratum Server GeoAPI

The CVMFS client queries the GeoAPI service on stratum servers to determine the location of each Stratum 1 and order them by proximity. To test this functionality:

curl http://${stratumA}/cvmfs/${repositoryname}/api/v1.0/geo/proxy.example.org/${stratumA},${stratumB},${stratumC},${stratumD}
2,3,1,4
This sample output shows that the 2nd server in the list (B) is closest, followed by the 3rd (C), 1st (A), and finally 4th (D). The distance is measured from the client (nominally, through the proxy proxy.example.org if the http_proxy environment variable is set accordingly) to each of the servers A, B, C, D, and the measurement is made using the GeoIP database and functionality of the server being queried (stratum A in this example). The proxy can be any arbitrary string and is only used for the purpose of caching the GeoAPI result, so that all clients using the same proxy string share the same cached result. In practice, to ensure that the curl query actually uses the proxy one could do
export http_proxy=http://proxy.example.org
although actually using a proxy is not strictly necessary to test GeoAPI functionality.

Maxmind DB lookup

The CVMFS server GeoAPI is based on the Maxmind GeoIP DB. To directly query the Maxmind DB on a stratum server you can use the mmdblookup tool:

 sudo yum install libmaxminddb-devel # available from EPEL
 mmdblookup --file /var/lib/cvmfs-server/geo/GeoLite2-City.mmdb --ip 206.12.59.21  location

Decompression

Files retrieved from a stratum server are compressed with the zlib algorithm, but do not have zlib headers. To decompress such files:

  • openssl zlib -d -in filename
  • Alternatively, on a CVMFS server pipe from stdin to cvmfs_swissknife zpipe -d

Secondary public key

To retrieve the secondary pubkey, encoded as a X509 certificate:

$ curl -s http://cvmfs-stratum-one.cern.ch:8000/cvmfs/cms.cern.ch/.cvmfspublished|cat -v|grep ^X
X97ed1a293d396c7525193a9c303ca0d1d384d78a
$ curl -s http://cvmfs-stratum-one.cern.ch:8000/cvmfs/cms.cern.ch/data/97/ed1a293d396c7525193a9c303ca0d1d384d78aX | cvmfs_swissknife zpipe -d | head -n 1
-----BEGIN CERTIFICATE-----
The secondary key is the .crt file on a stratum server, used to sign the repository manifest (the .cvmfspublished file). The fingerprint of the secondary pubkey is listed in the .cvmfswhitelist file, which is in turn signed by the repository master key.

Stratum server HTTP headers

To see the HTTP response headers from a stratum server: curl -sD - -o /dev/null http://cvmfs-s1-arbutus.computecanada.ca:8000/cvmfs/soft.computecanada.ca/.cvmfspublished

HTTP/1.1 200 OK
Date: Mon, 08 Jul 2019 20:51:19 GMT
Server: Apache/2.2.15 (CentOS)
Accept-Ranges: bytes
Content-Length: 611
Cache-Control: max-age=61
Expires: Mon, 08 Jul 2019 20:52:20 GMT
Content-Type: application/x-cvmfs
This shows that the server uses the Cache-Control header to ensure that the manifest file is never cached for more than 61 seconds. If you check the same thing on any CVMFS data object, you will see the max age is much longer (because these are immutable content-addressed objects).

Gateway API

To query the gateway API:

  • curl -s http://cvmfs-gateway:4929/api/v1/ | jq # check which API resources can be queried (repos, leases, etc.)
  • curl -s http://cvmfs-gateway:4929/api/v1/repos | jq # check which repositories are available, and corresponding paths and key identifiers