Description
By default CVMFS compresses files, however for data repos different considerations apply since some datasets can be incompressible binary data (or in a data format that is already natively compressed) while others may be highly compressible.
Of course if files are uncompressed, storage space and bandwidth are wasted, while if they are double compressed, CPU time is wasted while processing them on the publisher, and more importantly on clients , slowing down data access. And in some cases double compression could actually increase file size.
Using the gateway interface we can address this because we have the option of controlling compression on each transaction like this:
cvmfs_server publish -Z none
# don't compress
cvmfs_server publish -Z default
# compress
Now I have set the data repo back to compression by default. So if -Z none
is exposed to analysts with a publishing option in the scripts, they can use that to avoid double compression when incompressible binary data is published.