Skip to content

Conversation

@bbockelm
Copy link
Contributor

This PR adds support for the LZ4 compression algorithm (branch has been significantly cleaned up).

In testing with the sample Event object found in the test subdirectory, we found that the resulting files were about 13% larger than the default compression settings. Deserialization speed was 95% of the uncompressed speed.

As discussed with @pcanal

bbockelm added 2 commits May 25, 2017 15:18
Add lz4 source tarball for built-in version.

Tweak Event example to make it easier to compare compression algorithms.

Tweak build files for Event example to include TreePlayer (for TTreePerfStats).

Until we're sure we need it, no special path for Win32.

Add -fPIC flag to compile lz4
@phsft-bot
Copy link

Can one of the admins verify this patch?

@vgvassilev
Copy link
Member

@phsft-bot build!

@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with CMake flags -Dvc=OFF -Dimt=ON -Dccache=ON

Copy link
Member

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!



#---Check for LZ4--------------------------------------------------------------------
if(NOT builtin_lz4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we transform the two similar if-stmts into one if-else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not easily: the true-branch of the first if-statement can affect the outcome of the second conditional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. That part is tricky to read. Maybe we could add a clarification comment for the casual reader.

// Original Author: Brian Bockelman

/*************************************************************************
* Copyright (C) 1995-2015, Rene Brun and Fons Rademakers. *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we update the copyright year?

#include <stdint.h>

#if (__GNUC__ >= 3) || defined(__INTEL_COMPILER)
#if !defined(R__unlikely)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we rely on RConfig that would do this for us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - cleaning that up as part of the response to clang-tidy's other suggestions.


TFile *hfile;
TTree *tree;
TTreePerfStats *ioperf = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a nullptr.

@phsft-bot
Copy link

@phsft-bot
Copy link

Build failed on mac1012/native.
See console output.

Warnings:

  • include/TMVA/DNN/Architectures/Cpu/CpuMatrix.h:62:50: warning: instantiation of variable 'TMVA::DNN::TCpuMatrix<double>::fOnes' required here, but no definition is available [-Wundefined-var-template]
  • include/TMVA/DNN/Architectures/Cpu/CpuMatrix.h:144:4: warning: instantiation of variable 'TMVA::DNN::TCpuMatrix<double>::fPool' required here, but no definition is available [-Wundefined-var-template]
  • include/TMVA/DNN/Architectures/Cpu/CpuMatrix.h:62:50: warning: instantiation of variable 'TMVA::DNN::TCpuMatrix<float>::fOnes' required here, but no definition is available [-Wundefined-var-template]
  • include/TMVA/DNN/Architectures/Cpu/CpuMatrix.h:144:4: warning: instantiation of variable 'TMVA::DNN::TCpuMatrix<float>::fPool' required here, but no definition is available [-Wundefined-var-template]

Failing tests:

return;
R__zipLZMA(cxlevel, srcsize, src, tgtsize, tgt, irep);
return;
} else if (compressionAlgorithm == 4) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add and use a constant (like kLZMA) ...

nevent = TMath::Min(nevent,nentries);
if (read == 1) { //read sequential
//by setting the read cache to -1 we set it to the AutoFlush value when writing
ioperf = new TTreePerfStats("Perf Stats", tree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the use of TTreePerfStats optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I don't understand; seems to be harmless enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data collection takes 'some' time and thus if one wants to externally merge 'pure' performance one would want to run without.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the comment on line 189 belongs next to line 191

test/Makefile Outdated
EVENTO = Event.$(ObjSuf) EventDict.$(ObjSuf)
EVENTS = Event.$(SrcSuf) EventDict.$(SrcSuf)
EVENTSO = libEvent.$(DllSuf)
EVENTLIBS = -lTreePlayer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An equivalent is needed in the windows section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following: this is in a platform-neutral part of the Makefile, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but -l is not platform-neutral :) [So the -l indeed need to be avoided on windows.]

fprintf(stderr, "Error R__unzip_header: error in header\n");
return 1;
!(src[0] == 'C' && src[1] == 'S' && src[2] == Z_DEFLATED) && !(src[0] == 'X' && src[1] == 'Z' && src[2] == 0) &&
!(src[0] == 'L' && src[1] == '4')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is becoming unwindly, can we create a named repository of those pairs to use here and the 2 or 3 other places.

@bbockelm
Copy link
Contributor Author

@pcanal - any idea how to debug the ctest failure? Or at least get it to cough up the failure message?

When I run make cleantest directly it succeeds; when run via ctest, it fails.

@phsft-bot
Copy link

@phsft-bot
Copy link

@pcanal
Copy link
Member

pcanal commented May 25, 2017

the root-io-geo-make error is due to a missing commit that is (new) on the master.

for the other error try:

VERBOSE=true ctest  --output-on-failure --no-compress-output -V -R roottest.root.io.compression

@bbockelm
Copy link
Contributor Author

The ctest failure should be fixed by root-project/roottest#45

@phsft-bot
Copy link

@pcanal
Copy link
Member

pcanal commented May 25, 2017

@bbockelm please, try to be explicit in the git log so that later (much later) we can avoid looking at commit (to find a problem) ... "addressing review comment" does not give (our later selves) any clue of what is the nature of the change. [The only exception to that rule are 'white space' only change that have no functional side-effect]. Thanks.

return src[0] == 'C' && src[1] == 'S' && src[2] == Z_DEFLATED;
}

static int is_valid_header_xz(uch *src) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it 'xz' or LZMA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is technically xz, I believe. We use lzma elsewhere. Will update.


/* New zlib format */
if (src[0] == 'Z' && src[1] == 'L') {
if (is_valid_header_zlib(src)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as the side effect of adding one comparison ... which actually lead me to ponder what the semantic of the (src[2] == Z_DEFLATED) test is ... and why/whether there isn't an equivalent for LZ4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's simply the header, as defined by core/zip/src/Bits.h. The header appears to be custom to ROOT. For LZMA and LZ4, we have just used a two-byte header.

Int_t nentries = (Int_t)tree->GetEntries();
nevent = TMath::Min(nevent,nentries);
if (read == 1) { //read sequential
ioperf = getenv("ENABLE_TTREEPERFSTATS") ? new TTreePerfStats("Perf Stats", tree) : nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document this in the header?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the last change need for merging. [Adding support for configure/make so that we can backport to older version can/should be another merge request]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's already taken care of, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I had missed it ... but now I also took another look at the commit messages ... for the sake of our future selves can you update the commit message that just say things like "Small fixups from review." into a short description of the functional changes :). Thanks.

ifeq ($(PLATFORM),win32)
EVENTLIB = libEvent.lib
else
EVENTLIBS = -lTreePlayer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add

EVENTLIBS = $(ROOTSYS)/lib/libTreePlayer.lib

on the windows side.

}

tgt[0] = 'L';
tgt[1] = '4';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have been 'even' better if those had been abstracted out in the same place as the reading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be difficult - the generic header code and the compression-format specific code is interspersed with the actual implementation of the "old-style" compression algorithm.

I would prefer to defer this for follow-up work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. For the record, I would use a (potential) new header defining a set of strings/arrays ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's what I was thinking - new header to define the constants and rationalize the layout of the files (particularly with an eye towards separating the old C code).

Int_t nentries = (Int_t)tree->GetEntries();
nevent = TMath::Min(nevent,nentries);
if (read == 1) { //read sequential
ioperf = getenv("ENABLE_TTREEPERFSTATS") ? new TTreePerfStats("Perf Stats", tree) : nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the last change need for merging. [Adding support for configure/make so that we can backport to older version can/should be another merge request]

bbockelm added 2 commits May 27, 2017 21:36
Update copyright date on files new to this branch.

Add comment explaining logic in CMakeFile.txt

Mostly whitespaces; of note, the re-definition of R__likely/unlikely
has been moved back into RConfig.h.
Add LZ4 support to core library.
@bbockelm
Copy link
Contributor Author

@pcanal - the GitHub PR page right now is a bit messy, but the rewording of the commit messages you requested is complete. Ready to go?

@pcanal
Copy link
Member

pcanal commented May 29, 2017

Yes, thanks!

@pcanal pcanal merged commit 8c50188 into root-project:master May 29, 2017
@pcanal
Copy link
Member

pcanal commented May 29, 2017

Could you open a PR to update the configure/make :) ? Thanks.

@dpiparo
Copy link
Member

dpiparo commented May 30, 2017

Hi,

the PR broke Arm and fedora builds: could you please check?
http://cdash.cern.ch/index.php?project=ROOT

Cheers,
D

@bbockelm
Copy link
Contributor Author

@dpiparo - is it possible to pull up the logs? Unfortunately, the error message says simply the error message is in a different log file.

@dpiparo
Copy link
Member

dpiparo commented May 30, 2017

Hi,

I fixed that since the bug affected my development platform. Now the tarball is fetched from the eos based website from where all other tarballs are downloaded. The reason for the bug was the fact that curl could not use https.

@bbockelm
Copy link
Contributor Author

Thanks Danilo!

Yes, that was a TODO item to stage the tarball to the official repo -- you just saved us some work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants