Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TBB memory bug #77

Open
WhateverLiu opened this issue Aug 5, 2024 · 1 comment
Open

TBB memory bug #77

WhateverLiu opened this issue Aug 5, 2024 · 1 comment

Comments

@WhateverLiu
Copy link

WhateverLiu commented Aug 5, 2024

Hi parlay maintainers, there seems to be an evasive bug in the integration of parlay and TBB that took me a painful amount of time to catch. The following is a minimal reproducible example:

#include <iostream>
#include "parlaylib-master/include/parlay/primitives.h"
#define vec parlay::sequence

struct MstPrune {
  vec<int> parent;
  vec<vec<int>> mst;
  void operator() (const int eventNregion) {
    int neg1 = -1;
    parent.assign(eventNregion, neg1);
    for (int i = 0, iend = parent.size(); i < iend; ++i) parent[i] = i;
    mst.resize(eventNregion);
    for (int i = 0; i < eventNregion; ++i) mst[i].resize(0);
    for (int i = 0; i < eventNregion; ++i) mst[parent[i]].emplace_back(i);
  } 
};  

int main(int argc, char* argv[]) {
  vec<int> sizes(argc);
  for (int i = 0; i < argc; ++i) sizes[i] = std::atoi(argv[i]);
  vec<MstPrune> mst(parlay::num_workers());
  parlay::parallel_for(0, argc, [&](auto i)->void {
    auto t = parlay::worker_id();
    mst[t](sizes[i]);
  });
  int S = 0;
  for (auto& m: mst) {
    for (auto& x: m.mst) { for (auto& y: x) S += y; }
  }
  if (S != 77) std::cout << "---------- S = " << S << " ----------\n";
  else std::cout << "S = " << S << ", ";
}

#undef vec

After compiling the code with the following command
g++ -std=c++20 -Ofast -DPARLAY_TBB -ltbb bug.cpp -o bug
and run
for i in {1..1000}; do ./bug 1 1 1 1 12 1 1 2 2 4 3; done
You should be able to see nondeterministic result varying around the true answer "77".

If we change vec<int> parent; to std::vector<int> parent;, everything works fine.

If we switch to parlay's default thread scheduler, i.e. g++ -std=c++20 -Ofast -pthread bug.cpp -o bug , everything also works fine.

I tested both the latest oneAPI/tbb and an older version. The problem remains the same.

When the code is compiled as a shared library and linked to runtimes, segmentation fault will pop.

Could you help check if the bug is reproducible and provide some assistance?

@WhateverLiu
Copy link
Author

OK, I closed the issue earlier because the previous example did not illustrate my point. The memory error only occurs when the code is compiled as a shared library and linked to runtimes. As developers who extensively write and link C++ libraries to R and Python, we face an additional layer of complexity. I am now more certain that the problem lies with TBB. I have cleaned and reduced the example to the following:

  1. In the current directory, create bug.cpp with the code below:
// We create a vector of `m` functors (`Struct`), where `m` represents the number of threads.
// Each functor contains a sequence of integers (`x`) and a sequence of sequences (`y`).
// When running in parallel using TBB, allocating memory for `y` can unexpectedly affect 
// the memory allocation of `x`, which should not occur.
// Changing the sequence type from `parlay::sequence` to `std::vector` can resolve this issue.
// Alternatively, modifying `y` from a sequence of sequences to a simple sequence of integers 
// also eliminates the error.
// The issue likely stems from compatibility problems between TBB and 
// Parlay's memory allocator for small objects.

#include <iostream>
#include "parlaylib-master/include/parlay/primitives.h"
#define vec parlay::sequence

struct Struct {
  vec<int> x;
  vec<vec<int>> y; // Oddly, if y is vec<int> instead vec<vec<int>>, everything would work.
  bool operator() ( const int n ) { 
    x.resize(n);
    std::iota(x.begin(), x.end(), 0);
    size_t addr = size_t(x.data()); // Record the mem address as an integer.
    int xsize = x.size();
    y.resize(n); // Just allocate y;
    
    // If the allocation of `parent` is changed or its size is altered, 
    //   print message.
    if (size_t(x.data()) != addr or int(x.size()) != xsize) {
      auto s = "Wrong! Why is `x` changed ?\n"
      "The initial address 'x.data()' is " + std::to_string(addr) + 
        ", but the current address is " + std::to_string(size_t(x.data())) +
        ".\nThe initial size is " + std::to_string(xsize) + 
        ", but the current size is " + std::to_string(x.size()) +
        ".\nThe initial elements are [0, ..., " + std::to_string(xsize) +
        "], but the current elements are [";
      for (int i = 0, iend = x.size(); i < iend; ++i) 
        s += std::to_string(x[i]) + ", ";
      s += "].\n\n\n";
      std::cerr << s;
      return false;
    }
    return true;
  }  
}; 

void bug() {
  std::vector<int> ns {1, 1, 1, 1, 12, 1, 1, 2, 2, 4};
  vec<Struct> mstPv(parlay::num_workers());
  parlay::parallel_for(0, ns.size(), [&](auto i)->void {
    auto t = parlay::worker_id();
    mstPv[t](ns[i]);  
  }); 
}
#undef vec
  1. Create main.cpp with the following code:
void bug(void);
int main() {
  for (int i = 0; i < 10000; ++i) bug(); // Run the buggy code 10000 times since the error occurs at random. 
}
  1. Run the following 3 sets of commands: use parlay's native thread scheduler, use openmp, and finally use tbb. Everything works fine except for tbb. It will print a large number of messages.
g++ -std=c++20 -shared -pthread -Ofast -fpic bug.cpp -o libbug.so
g++ -std=c++20 -Ofast -L. -lbug  main.cpp  -o  main
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
./main

g++ -std=c++20 -shared -DPARLAY_OPENMP -fopenmp -Ofast -fpic bug.cpp -o libbug.so
g++ -std=c++20 -Ofast -L. -lbug  main.cpp  -o  main
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
./main

g++ -std=c++20 -shared -Ofast -DPARLAY_TBB -ltbb -fpic bug.cpp -o libbug.so
g++ -std=c++20 -Ofast -L. -lbug  main.cpp  -o  main
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
./main

I hope the authors can confirm and resolve this issue:) Our entire system is built on parlay and TBB. We prefer not to abandon TBB, as it serves as the backend for the C++ parallel STL, which includes parallelized algorithms not available in parlay

Thank you!

@WhateverLiu WhateverLiu reopened this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant