Skip to content

compression usage for data repos #107

Open
@rptaylor

Description

@rptaylor

By default CVMFS compresses files, however for data repos different considerations apply since some datasets can be incompressible binary data (or in a data format that is already natively compressed) while others may be highly compressible.

Of course if files are uncompressed, storage space and bandwidth are wasted, while if they are double compressed, CPU time is wasted while processing them on the publisher, and more importantly on clients , slowing down data access. And in some cases double compression could actually increase file size.

Using the gateway interface we can address this because we have the option of controlling compression on each transaction like this:
cvmfs_server publish -Z none # don't compress
cvmfs_server publish -Z default # compress

Now I have set the data repo back to compression by default. So if -Z none is exposed to analysts with a publishing option in the scripts, they can use that to avoid double compression when incompressible binary data is published.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions