compression usage for data repos

By default CVMFS compresses files, however for data repos different considerations apply since some datasets can be incompressible binary data (or in a data format that is already natively compressed) while others may be highly compressible.

Of course if files are uncompressed, storage space and bandwidth are wasted, while if they are double compressed, CPU time is wasted while processing them on the publisher, and more importantly on clients , slowing down data access. And in some cases double compression could actually increase file size.

Using the gateway interface we can address this because we have the option of controlling compression on each transaction like this:
`cvmfs_server publish -Z none`     # don't compress
`cvmfs_server publish -Z default`     # compress

Now I have set the data repo back to compression by default. So if `-Z none` is exposed to analysts with a publishing option in the scripts, they can use that to avoid double compression when incompressible binary data is published.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

compression usage for data repos #107

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

compression usage for data repos #107

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions