ThreadSanitizerAboutRaces

Introduction

Most programmers know that races are harmful.
For some races it could be quite easy to predict what may go wrong.
For example, the variable var in the following code may not be equal to 2 at the end of the program execution (for more examples refer to ThreadSanitizerPopularDataRaces)

int var;

void Thread1() {  // Runs in one thread.

  var++;

}

void Thread2() {  // Runs in another thread.

  var++;

}

However, there are other, much more subtle races which are much less obviously harmful. This article is about such races. Most of the content is specific to C/C++.

TODO(timurrrr) Reference the C++11 standard mentioning data races lead to UB?

TODO(kcc) This article is not finished. Expect more content later.

Racy publication

The race which is most frequently (and mistakenly) perceived as harmless is unsafe publication:

struct Foo {

  int a;

  Foo() { a = 42; }

};



static Foo *foo = NULL;



void Thread1() {  // Create foo.

  foo = new Foo();

}



void Thread2() {  // Consume foo.

  if (foo) {

     assert(foo->a == 42);

  }

}

ThreadSanitizer and most other race detectors will report this as a race on foo.

Objection 1

But hey, why this race is harmful? Follow my fingers:

Thread1 calls new Foo(), which allocates memory and calls Foo::Foo()
Thread1 assigns the new value to foo and foo becomes non-NULL.
Thread2 checks if foo != NULL and only then reads foo->a.
The code is safe!

Clarification 1

On machines with weak memory model, such as IBM Power or ARM,
foo != NULL does not guarantee that the contents of *foo are ready for reading in Thread2.
For example, the following code does not work on machines with weak memory model.

int A = 0, B = 0;



void Thread1() {

  A = 1;

  B = 1;

}



void Thread2() {

  if (B == 1) {

     assert(A == 1);

  }

}

Objection 2

Ok, ok. But I don't care about weak memory model machines. x86 has a strong memory model!

Clarification 2

If you are writing in assembly, yes, this is safe.
But with C/C++ you can not rely on the machine's strong memory model because the compiler may (and will!) rearrange the code for you.
For example, it may easily replace A=1;B=1; with B=1;A=1;.

Objection 3

Ok, I believe that compilers may do simple reordering, but that doesn't apply to my code!
For example, Foo::Foo() is defined in one .cc file and foo = new Foo() resides in another.
And even if these were in a single file, what kind of reordering could harm me?

Clarification 3

First, it's a bad idea to rely on the compiler's inability to do some legal transformation.
Second, the modern compilers are much more sophisticated than most programmers think.
For example, many compilers can do cross-file inlining or even inline virtual functions (!).
So, in the example above a compiler may (and often will) change the code to

  foo = (Foo*)malloc(sizeof(Foo));

  new (foo) Foo();

Objection 4

Ok. So I will do something to avoid any bad compiler effect in the thread which creates foo.
But obviously, the consumer thread's code is safe. What can possibly go wrong with this code on x86?

void Thread2() {  // Consume foo.

  if (foo) {  

     // foo is properly published by Thread1().

     assert(foo->a == 42);

  }

}

Clarification 4

Again, you are underestimating the cleverness of modern compilers. How about this (it really happens!)?

void Thread2() {  // Consume foo.

  Foo *t1 = foo;  // reads NULL

  Foo *t2 = foo;  // reads the new value

  if (t2) {

     assert(t1->a == 42);

  }

}

Double-Checked Locking

Double-Checked Locking is broken in C++ because of all the reasons discussed above.

Racy lazy initialization

One more frequent bug:

// lazy init for Pi. 

float *GetPi() {

  static float pi;

  static bool have_pi = false;

  if (!have_pi) {

     pi = ComputePi(); // atomic assignment, no cache coherency issues.                    

     have_pi = true;

  }

  return pi;

}

Even experienced programmers may assume that this code is correct on a CPU with strong memory model.
But remember that a compiler may rearrange have_pi = true; and pi = ComputePi(); and one of the threads will return uninitialized value of pi.

Volatile

DON'T use volatile for synchronization in C/C++.
The C and C++ standards don't say anything about volatile and threads -- so don't assume anything.

Also, the C++11, chapter intro.multithread, paragraph 21 says:
"The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior."

You don't believe us? Ok, here is a simplest proof that C++ compilers don't treat volatile as synchronization.

% cat volatile.cc 

typedef long long T;

volatile T vlt64;

void volatile_store_64(T x) {

        vlt64 = x;

}

% g++ --version 

g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3

...

% g++ -O2 -S volatile.cc -m32 && cat volatile.s 

...

_Z17volatile_store_64x:

.LFB0:

        pushl   %ebp

        movl    %esp, %ebp

        movl    8(%ebp), %eax

        movl    12(%ebp), %edx

        popl    %ebp

        movl    %eax, vlt64

        movl    %edx, vlt64+4

        ret

...

You can clearly see that a 64-bit volatile variable is written in two pieces w/o any synchronization.
Do you still use C++ volatile as synchronization?

You may object that we are cheating (with 64-bit ints on a 32 arch).
Ok, we indeed cheated, but just a bit. How about this case?

% cat volatile2.cc 

volatile bool volatile_bool_done;

double regular_double;

void volatile_example_2() {

        regular_double = 1.23;

        volatile_bool_done = true;

}

% g++ -O2 volatile2.cc -S

% cat volatile2.s 

...

_Z18volatile_example_2v:

.LFB0:

..

        movabsq $4608218246714312622, %rax

        movb    $1, volatile_bool_done(%rip)

        movq    %rax, regular_double(%rip)

        ret

...

Here you can see that the compiler reordered a non-volatile store and a volatile store.

Are you convinced now? If no, please let us know why!

How to fix

So, how do I make my code correct?

If you are not a black belt in synchronization primitives, please use something crafted by those who are (e.g. pthread_mutex_lock, pthread_once, etc).

If you are indeed a black belt (are you?), you probably know about CAS and other atomics, memory barriers, compiler barriers and other magic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly