Skip to content

Conversation

@guitargeek
Copy link
Contributor

@guitargeek guitargeek commented Jan 19, 2023

This fixes a memory leak related to the usage of TRef in TFoamCell
that can be reproduced with the following code:

class MyTFoamBinding : public TFoamIntegrand {
public:
   double Density(Int_t ndim, double *) override { return 1.0; }
};

void reproTFoam()
{
   ProcInfo_t procinfo;
   const float toMB = 1.f / 1024.f;

   std::size_t nToys = 5000;

   TRandom3 rand{};

   for (std::size_t index_toy = 1; index_toy <= nToys; index_toy++) {

      if (index_toy % 500 == 0) {
         gSystem->GetProcInfo(&procinfo);
         std::cout << index_toy << ": " << procinfo.fMemResident * toMB
                   << "   " << procinfo.fMemVirtual * toMB << std::endl;
      }

      MyTFoamBinding binding{};
      TFoam tfoam{"TFOAM"};
      tfoam.SetkDim(1);
      tfoam.SetRho(&binding);
      tfoam.SetPseRan(&rand);

      tfoam.SetnCells(30);

      tfoam.SetnSampl(200);
      tfoam.SetPseRan(&rand);
      tfoam.SetChat(0);
      tfoam.Initialize();
   }
}

Before this commit, you will see a steady memory increase that is gone
after this commit.

This fixes the fundamental memory increase in RooFit toy studies,
reported for example in #8984.

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on mac12/noimt.
Running on macphsft17.dyndns.cern.ch:/Users/sftnight/build/jenkins/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on ROOT-ubuntu2004/python3.
Running on root-ubuntu-2004-3.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on ROOT-performance-centos8-multicore/cxx17.
Running on olbdw-01.cern.ch:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on ROOT-ubuntu18.04/nortcxxmod.
Running on sft-ubuntu-1804-2.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on ROOT-debian10-i386/soversion.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on mac11/cxx14.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@guitargeek guitargeek changed the title [math] Avoid memory increase related to using TFoam [math] Replace TRef with array indices in TFoamCell Jan 19, 2023
@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on ROOT-performance-centos8-multicore/cxx17.
Running on olbdw-01.cern.ch:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

Copy link
Member

@lmoneta lmoneta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thank you Jonas for the fix and still maintaining the I/O capabilities of TFoam to keep backward compatibility.

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on ROOT-performance-centos8-multicore/cxx17.
See console output.

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on mac12/noimt.
Running on macphsft17.dyndns.cern.ch:/Users/sftnight/build/jenkins/workspace/root-pullrequests-build
See console output.

Errors:

  • [2023-01-19T17:31:35.499Z] FAILED: builtins/davix/DAVIX-prefix/src/DAVIX-stamp/DAVIX-download /Users/sftnight/build/jenkins/workspace/root-pullrequests-build/build/builtins/davix/DAVIX-prefix/src/DAVIX-stamp/DAVIX-download
  • [2023-01-19T17:31:35.499Z] CMake Error at DAVIX-stamp/DAVIX-download-Release.cmake:49 (message):

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on ROOT-ubuntu18.04/nortcxxmod.
Running on sft-ubuntu-1804-2.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2023-01-19T17:38:48.312Z] FAILED: VECCORE-prefix/src/VECCORE-stamp/VECCORE-download /mnt/build/workspace/root-pullrequests-build/build/VECCORE-prefix/src/VECCORE-stamp/VECCORE-download
  • [2023-01-19T17:38:48.312Z] CMake Error at VECCORE-stamp/VECCORE-download-Release.cmake:49 (message):

@phsft-bot
Copy link

Build failed on mac11/cxx14.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2023-01-19T17:49:54.932Z] FAILED: builtins/xrootd/XROOTD-prefix/src/XROOTD-stamp/XROOTD-download /Users/sftnight/build/workspace/root-pullrequests-build/build/builtins/xrootd/XROOTD-prefix/src/XROOTD-stamp/XROOTD-download
  • [2023-01-19T17:49:54.932Z] CMake Error at XROOTD-stamp/XROOTD-download-Release.cmake:49 (message):
  • [2023-01-19T17:49:55.196Z] FAILED: builtins/davix/DAVIX-prefix/src/DAVIX-stamp/DAVIX-download /Users/sftnight/build/workspace/root-pullrequests-build/build/builtins/davix/DAVIX-prefix/src/DAVIX-stamp/DAVIX-download
  • [2023-01-19T17:49:55.196Z] CMake Error at DAVIX-stamp/DAVIX-download-Release.cmake:49 (message):

@pcanal
Copy link
Member

pcanal commented Jan 19, 2023

Could add a few words to the commit log explaining why the use the TRef was leading to a memory leak?

@guitargeek
Copy link
Contributor Author

guitargeek commented Jan 19, 2023

Good question, but unfortunately I have to admit I don't really know why. I didn't have that much time to investigate but needed a fix for 6.28 for which I'm already late. The TFoam class is quite fundamental to RooFit, and the memory increase has affected many users.

All I had was this hint by valgrind when checked the reproducer in the commit message:

Conditional jump or move depends on uninitialised value(s)
   at 0x402E09: TStorage::UpdateIsOnHeap(unsigned int const volatile&, unsigned int volatile&) (TStorage.h:132)
   by 0x501B33D: TDirectory::TDirectory() (in /usr/lib64/root/libCore.so.6.26.10)
   by 0x5816D81: TDirectoryFile::TDirectoryFile() (in /usr/lib64/root/libRIO.so.6.26.10)
   by 0x5833AF8: TFile::TFile(char const*, char const*, char const*, int) (in /usr/lib64/root/libRIO.so.6.26.10)
   by 0xE2D86A6: TCling::LoadPCM(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (in /usr/lib64/root/libCling.so.6.26.10)
   by 0xE2D9A17: TCling::RegisterModule(char const*, char const**, char const**, char const*, char const*, void (*)(), std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > const&, char const**, bool, bool) (in /usr/lib64/root/libCling.so.6.26.10)
   by 0x4FF098E: TROOT::InitInterpreter() (in /usr/lib64/root/libCore.so.6.26.10)
   by 0x4FF0C9E: ROOT::Internal::GetROOT2() (in /usr/lib64/root/libCore.so.6.26.10)
   by 0x510209C: ROOT::TGenericClassInfo::GetClass() (in /usr/lib64/root/libCore.so.6.26.10)
   by 0x90B67CA: TFoamCell::Class() (in /usr/lib64/root/libFoam.so.6.26.10)
   by 0x504BE7A: TRef::operator=(TObject*) (in /usr/lib64/root/libCore.so.6.26.10)
   by 0x90B126F: TFoamCell::Fill(int, TFoamCell*, TFoamCell*, TFoamCell*) (in /usr/lib64/root/libFoam.so.6.26.10)
 Uninitialised value was created by a stack allocation
   at 0xE2D8106: TCling::LoadPCM(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (in /usr/lib64/root/libCling.so.6.26.10)

This made me think that maybe TRef is the problem, and indeed replacind the TRef with simple indices solved the problem.

Is there something that you can make of that valgrind output maybe?

@pcanal
Copy link
Member

pcanal commented Jan 19, 2023

Is there something that you can make of that valgrind output maybe?

yes. It is expected, does not indicate a leak and thus is unrelated to your issue :( (Were you using --suppressions=$ROOTSYS/etc/valgrind-root.supp when executing valgrind to remove/hide known false positive.)

@guitargeek
Copy link
Contributor Author

Oh yes you are right, I keep forgetting to use that file! I don't use valgrind that often

Now I was re-running the reproducer with the suppression file, and actually there are still lines in the log related to the TRef:

8 bytes in 1 blocks are still reachable in loss record 194 of 8,450
   at 0x4C388C3: operator new(unsigned long) (vg_replace_malloc.c:422)
   by 0xE2DE8BD: void std::vector<void*, std::allocator<void*> >::_M_realloc_insert<void* const&>(__gnu_cxx::__normal_iterator<void**, std::vector<void*, std::allocator<void*> > >, void* const&) (in /usr/lib64/root/libClin
0)
   by 0x10B852E7: ??? (in /usr/lib64/root/libCling.so.6.26.10)
   by 0xE292E1B: CreateInterpreter (in /usr/lib64/root/libCling.so.6.26.10)
   by 0x4FF08EF: TROOT::InitInterpreter() (in /usr/lib64/root/libCore.so.6.26.10)
   by 0x4FF0C9E: ROOT::Internal::GetROOT2() (in /usr/lib64/root/libCore.so.6.26.10)
   by 0x510209C: ROOT::TGenericClassInfo::GetClass() (in /usr/lib64/root/libCore.so.6.26.10)
   by 0x90B67CA: TFoamCell::Class() (in /usr/lib64/root/libFoam.so.6.26.10)
   by 0x504BE7A: TRef::operator=(TObject*) (in /usr/lib64/root/libCore.so.6.26.10)
   by 0x90B126F: TFoamCell::Fill(int, TFoamCell*, TFoamCell*, TFoamCell*) (in /usr/lib64/root/libFoam.so.6.26.10)
   by 0x90A86B6: TFoam::CellFill(int, TFoamCell*) (in /usr/lib64/root/libFoam.so.6.26.10)
   by 0x90A8FA4: TFoam::Divide(TFoamCell*) (in /usr/lib64/root/libFoam.so.6.26.10)

Are these expected too? I mean there must be a problem somewhere, otherwise this PR would not fix the problem with the memory increase when instantiating lots of TFoams

@pcanal
Copy link
Member

pcanal commented Jan 19, 2023

Are these expected too?

No. Those appears (likely) to be within a function that is inlined directly in the script.

@pcanal
Copy link
Member

pcanal commented Jan 19, 2023

After compiling the code as an executable and using massif, I manage to clarify that this is not a memory leak but rather memory use to keep track of the unique ids associated with the TRef:

 n0: 10812793 in 2730 places, all below massif's threshold (1.00%)
 n1: 6569856 0x4BF2279: TStorage::ReAlloc(void*, unsigned long, unsigned long) (TStorage.cxx:240)
  n1: 6569856 0x4C6A2E9: TObjArray::Expand(int) (TObjArray.cxx:405)
   n2: 6569856 0x4C6979C: TObjArray::AddAtAndExpand(TObject*, int) (TObjArray.cxx:244)
    n1: 6553600 0x4BE15AB: TProcessID::PutObjectWithID(TObject*, unsigned int) (TProcessID.cxx:390)
     n1: 6553600 0x4BE0AE6: TProcessID::AssignID(TObject*) (TProcessID.cxx:185)
      n2: 6553600 0x4BEE1B7: TRef::operator=(TObject*) (TRef.cxx:294)
       n1: 6553600 0x48801E2: TFoamCell::SetDau0(TFoamCell*) (TFoamCell.h:63)
        n1: 6553600 0x487BCA1: TFoam::Divide(TFoamCell*) (TFoam.cxx:952)
         n1: 6553600 0x487B9C8: TFoam::Grow() (TFoam.cxx:888)
          n1: 6553600 0x48792A2: TFoam::Initialize() (TFoam.cxx:412)
           n1: 6553600 0x10B67A: reproTFoam() (in /home/pcanal/root_working/build/master-cpp17-debug/a.out)
            n0: 6553600 0x10B76F: main (in /home/pcanal/root_working/build/master-cpp17-debug/a.out)

vs

 n1: 19677056 0x4BF2279: TStorage::ReAlloc(void*, unsigned long, unsigned long) (TStorage.cxx:240)
  n1: 19677056 0x4C6A2E9: TObjArray::Expand(int) (TObjArray.cxx:405)
   n2: 19677056 0x4C6979C: TObjArray::AddAtAndExpand(TObject*, int) (TObjArray.cxx:244)
    n1: 19660800 0x4BE15AB: TProcessID::PutObjectWithID(TObject*, unsigned int) (TProcessID.cxx:390)
     n1: 19660800 0x4BE0AE6: TProcessID::AssignID(TObject*) (TProcessID.cxx:185)
      n2: 19660800 0x4BEE1B7: TRef::operator=(TObject*) (TRef.cxx:294)
       n1: 19660800 0x48801E2: TFoamCell::SetDau0(TFoamCell*) (TFoamCell.h:63)
        n1: 19660800 0x487BCA1: TFoam::Divide(TFoamCell*) (TFoam.cxx:952)
         n1: 19660800 0x487B9C8: TFoam::Grow() (TFoam.cxx:888)
          n1: 19660800 0x48792A2: TFoam::Initialize() (TFoam.cxx:412)
           n1: 19660800 0x10B67A: reproTFoam() (in /home/pcanal/root_working/build/master-cpp17-debug/a.out)
            n0: 19660800 0x10B76F: main (in /home/pcanal/root_working/build/master-cpp17-debug/a.out)

Copy link
Member

@pcanal pcanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So indeed, TRef was an overkill to implement those references (since they are solely within the object and the object is not expected to be split). The memory use is due to the fact that the TRef counter is never reset and thus the collection of references keeps growing.

There ought to be a way to write a schema evolution rule to update the index from the TRef (previous attempts may have been missing some 'sources'). However the current solution is adequate (albeit wasting a bit of memory and disk space), so the fix can wait for another PR.

This fixes a memory leak related to the usage of `TRef` in `TFoamCell`
that can be reproduced with the following code:

```C++
class MyTFoamBinding : public TFoamIntegrand {
public:
   double Density(Int_t ndim, double *) override { return 1.0; }
};

void reproTFoam()
{
   ProcInfo_t procinfo;
   const float toMB = 1.f / 1024.f;

   std::size_t nToys = 5000;

   TRandom3 rand{};

   for (std::size_t index_toy = 1; index_toy <= nToys; index_toy++) {

      if (index_toy % 500 == 0) {
         gSystem->GetProcInfo(&procinfo);
         std::cout << index_toy << ": " << procinfo.fMemResident * toMB
                   << "   " << procinfo.fMemVirtual * toMB << std::endl;
      }

      MyTFoamBinding binding{};
      TFoam tfoam{"TFOAM"};
      tfoam.SetkDim(1);
      tfoam.SetRho(&binding);
      tfoam.SetPseRan(&rand);

      tfoam.SetnCells(30);

      tfoam.SetnSampl(200);
      tfoam.SetPseRan(&rand);
      tfoam.SetChat(0);
      tfoam.Initialize();
   }
}
```

Before this commit, you will see a steady memory increase that is gone
after this commit.

This fixes the fundamental memory increase in RooFit toy studies,
reported for example in root-project#8984.
The docs said explicitly to not use the copy constructor and the
copy assignment, so these operators should be deleted, as well as
moving.
@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@guitargeek
Copy link
Contributor Author

Thank you for your assessment, @pcanal!

I have updated the PR a last time to squash some commits, and I'll merge it then.

@phsft-bot
Copy link

Build failed on mac12/noimt.
Running on macphsft17.dyndns.cern.ch:/Users/sftnight/build/jenkins/workspace/root-pullrequests-build
See console output.

Warnings:

  • [2023-01-20T12:02:32.107Z] /Users/sftnight/build/jenkins/workspace/root-pullrequests-build/root/core/base/src/TDirectory.cxx:1245:7: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead. [-Wdeprecated-declarations]

Failing tests:

And 58 more

@guitargeek guitargeek merged commit 529f919 into root-project:master Jan 20, 2023
@guitargeek guitargeek deleted the tfoam branch January 20, 2023 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RF] Problem of memory leak (?) with RooDataSet

4 participants