Skip to content

Conversation

@hageboeck
Copy link
Member

This is a fix for ROOT-7121.

If a cache is updated in RooVectorDataStore and the cache has more than 1000 elements to be updated, an array on the stack will overrun and smash the stack. roofit will therefore crash.

Solution: RooVectorDataStore uses a std::vector instead of an array[1000] to hold the pointers to the cache elements.

Comments on the speed of the fix:
Using a std::vector placed on the stack (mimicking the original implementation), the fits would get slower. Therefore I added the vector as a member of RooVectorDataStore. This saves the time of constantly having to reallocate the vector.

I tested with my (private) workspace: The crash is fixed. Unfortunately, I cannot provide this workspace.
To give a more meaningful test for you guys, I ran all the roofit/roostats tutorials and diffed the logs to check if roofit gives the same results. The diffs are attached. Apart from out-of-order execution and time measurements, there is no difference.
From the time measurements you can also see that the fixed version is not slower.

tutorials_roofit.diff.txt
tutorials_roostats.diff.txt

Copy link
Member

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

I'd be great if you can add a test case for this PR.

if (!_cache) return ;

pRealVector tv[1000] ;
if (static_cast<Int_t>(_cacheUpdateList.size()) < _cache->_nReal) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change _nReal's type to size_t instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is of course a good idea.

if (!_cache) return ;

pRealVector tv[1000] ;
if (static_cast<Int_t>(_cacheUpdateList.size()) < _cache->_nReal) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd check for the capacity of the vector not the size and call reserve instead of resize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the code worked in the beginning was to directly access the elements of the array. I fixed the array overflow by introducing as few changes as possible: As it works now, reserve won't do the job, it would need more changes further below.

mutable Double_t _curWgtErr ; // Weight of current event

RooVectorDataStore* _cache ; //! Optimization cache
std::vector<RooVectorDataStore::RealVector*> _cacheUpdateList ; //! Pointers to the optimization cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to preallocate with reserve() at least 1024 elements, this would reduce the heap reallocations.

@vgvassilev
Copy link
Member

ping...

@phsft-bot
Copy link

Can one of the admins verify this patch?

@hageboeck
Copy link
Member Author

Hi,

I reviewed your suggestions. They are good suggestions, but they are somewhat orthogonal to the fix for the array overflow: Clearly, the current code could use some reworking like getting the types right, but that would require some more work than just the fix of the stack overflow. I could maybe tend to that when I'm back from travelling in May.

@vgvassilev
Copy link
Member

Ok, lets check the if the bots are happy with what we have so far.

@vgvassilev vgvassilev closed this Mar 31, 2017
@vgvassilev vgvassilev reopened this Mar 31, 2017
@phsft-bot
Copy link

Can one of the admins verify this patch?

@vgvassilev
Copy link
Member

@phsft-bot build!

@phsft-bot
Copy link

Starting build on centos7/gcc49, mac1011/native, slc6/gcc49, slc6/gcc62, ubuntu14/native

@vgvassilev
Copy link
Member

vgvassilev commented Mar 31, 2017

Clang-format seems unhappy. The other failures are independent on that.

@lmoneta
Copy link
Member

lmoneta commented Apr 11, 2017

The code looks fine to me. I don't think using vector::reserve is needed in this case.

It would be nice to test the performance in a complex fix which takes some time and uses heavily the cache, The tutorial are maybe too simple and the fitting is too fast.
If you can maybe simplify your workspace and make a standalone running example would be great.
Also, did you observe a substantial performance penalty when creating the std::vector in recalculateCache instead of having as a data member of the class ?

Lorenzo

@phsft-bot
Copy link

Can one of the admins verify this patch?

@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with CMake flags -Dvc=OFF -Dimt=OFF -Dccache=ON

@hageboeck
Copy link
Member Author

Following up on Lorenzo's comment: Can a local vector be used?
Short answer: Yes

Long answer:
When I first fixed the bug, I did exactly that. In the tutorials I saw a slight slowdown, probably because of reallocations or some overhead for managing the vector instead of having an array[1000] which appears on the stack instantly.

Now, I did timing tests with a heavy workspace, gcc 5.4 -O2. Results are as follows:
There is almost no difference in speed. If differences are more than statistical, the implementation with a (local or instance-persistent) vector is even faster than array[1000]. See attached file.

I therefore committed a version with a local vector and vec.reserve(), the most easy, most clean fix.
timingResults.txt

@vgvassilev
Copy link
Member

@phsft-bot build!

@vgvassilev
Copy link
Member

@hageboeck could you fix the formatting issues reported by clang-format (see Details of Travis CI)

@hageboeck
Copy link
Member Author

Done. I just patched in the diff from the clang-format.

@vgvassilev
Copy link
Member

@phsft-bot build!

@hageboeck, thanks!

@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with CMake flags -Dvc=OFF -Dimt=ON -Dccache=ON

@vgvassilev vgvassilev merged commit 72a1898 into root-project:master May 31, 2017
lmoneta pushed a commit that referenced this pull request Jun 1, 2017
Fix stack corruption in RooVectorDataStore (ROOT-7121).
@hageboeck hageboeck deleted the RooVectorDataStoreFix branch November 8, 2018 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants