Make Ref assignment atomic #46302

ellenhp · 2021-02-21T22:53:07Z

This PR ensures that references are correctly counted even when two threads assign to the same Ref<> variable at the same time. There's a nonzero performance cost to using atomics here, but we're already using atomics in SafeRefCount so I think it's probably fine. I need correct behavior in Ref<> for the implementation godotengine/godot-proposals#2299 See godotengine/godot-proposals#2330 for justification for this change.

I can't seem to get ASAN to work on macOS so I'd really appreciate some more thorough testing. I'll update this if I find a way to test for leaks and memory errors.

Please review this as if I'm just some random person on the internet who barely knows C++ modifying one of the most fundamental data structures in the entire engine, because that is 100% accurate. 🤣

Fixes: godotengine/godot-proposals#2330

ellenhp · 2021-02-22T01:02:10Z

Question for @RandomShaper since I think you were working on some of the other atomics in the engine: Should we use a different memory order than the default? I don't understand the instructions that this code gets compiled down to, but it seems like the more relaxed memory orders might be faster.

https://www.cplusplus.com/reference/atomic/memory_order/

Obviously I haven't profiled it, but I don't personally care what order assignments happen. If two threads write to the same Ref without a lock, the ordering should be undefined. I don't see any benefit to a stricter guarantee than promising that we'll reference count correctly and that the value of reference will match one of the two values assigned to it. Strict guarantees on ordering are what mutexes are for, IMO. 🤷‍♀️

ellenhp · 2021-02-22T01:29:17Z

Related: #35046

RandomShaper · 2021-02-22T11:17:43Z

I think you don't need to make the local pointers atomic. Just by having an atomic member pointer, you can get and set its value into (and from) the local pointers safely.

Regarding memory order, I'd need more time to reason about all the possibilities, but I'm quite confident that relaxed is fine, given the kind of guarantee you want to add with this PR, which it not full thread safety (that, as you said, would likely require some locking mechanism, or maybe one super clever lock-less algorithm).

Instead of adding multiple review comments, I've applied my changes locally (on top of yours) and created this patch file:
patch.txt

ellenhp · 2021-02-23T01:59:52Z

Patch looks good. CI is breaking but I think it's ready for another look :)

RandomShaper

Looks good to me. However, before merging maybe it'd be good to add a test that exercises this. Or maybe it can be done later (I may be able to add the test myself, but not any sooner than the weekend).

ellenhp · 2021-02-24T01:30:55Z

I don't have much experience with tests, and I think they're failing on master right now so I'm not sure how to work around that.

RandomShaper · 2021-02-24T08:53:12Z

I'll try do add a test myself today. CI seems fine. I guess this can be merged and, in case issues arise or the test fails, you can just make a fixup PR. Working on master has advantages... 😀

reduz · 2021-02-24T10:51:43Z

Making the pointer atomic forces the CPU to do a barrier every time its accessed (which happens far more often than it is copied), which is going to have a serious performance implication. Barriers are cache invalidations across CPUs so this is not cheap. I don't think solving the case mentioned in this PR is worth this extra cost.

@RandomShaper mentions it is possible to ensure that barrier won't happen at compile time, so that might work. I was not aware of this C++11 feature.

RandomShaper · 2021-02-24T11:22:11Z

@ellenhp, after discussing this with @reduz on IRC, we've agreed that, in order to know if some future CPU or C++ library needs to lock to achieve atomicity (so we know we must use a different approach if that happens), adding this inside the body of the class is desired:
static_assert(std::atomic<T *>::is_always_lock_free)

That guarantee, together with the relaxed ordering, gives us enough peace of mind.

ellenhp · 2021-02-24T18:12:05Z

I'll make this change later tonight and confirm it's working by going through the generated assembly for some toy examples. I do think that a read-modify-write operation like exchange is going to either invalidate some caches, stall the other CPUs or flush some pipelines, which is why I mentioned this would have a performance impact in the PR description. Like I said though, we're already doing this for the reference counts themselves and I don't think anyone has noticed a performance impact.

ellenhp · 2021-02-25T05:06:40Z

It looks like there is a memory barrier on x86_64 (thank you @fire for dumping this assembly for me 👍)

.seh_proc _ZNSt3__121__cxx_atomic_exchangeIyEET_PNS_22__cxx_atomic_base_implIS1_EES1_NS_12memory_orderE
# %bb.0:
	subq	$56, %rsp
	.seh_stackalloc 56
	.seh_endprologue
	movq	%rcx, 48(%rsp)
	movq	%rdx, 40(%rsp)
	movl	%r8d, 36(%rsp)
	movq	48(%rsp), %rax
	movl	36(%rsp), %r8d
	movq	40(%rsp), %rcx
	movq	%rcx, 24(%rsp)
	addl	$-1, %r8d
	movl	%r8d, %ecx
	subl	$4, %r8d
	movq	%rax, 8(%rsp)                   # 8-byte Spill
	movq	%rcx, (%rsp)                    # 8-byte Spill
	ja	.LBB18_1
# %bb.7:
	leaq	.LJTI18_0(%rip), %rax
	movq	(%rsp), %rcx                    # 8-byte Reload
	movslq	(%rax,%rcx,4), %rdx
	addq	%rax, %rdx
	jmpq	*%rdx
.LBB18_1:
	movq	24(%rsp), %rax
	movq	8(%rsp), %rcx                   # 8-byte Reload
	xchgq	%rax, (%rcx)
	movq	%rax, 16(%rsp)
	jmp	.LBB18_6
.LBB18_2:
	movq	24(%rsp), %rax
	movq	8(%rsp), %rcx                   # 8-byte Reload
	xchgq	%rax, (%rcx)
	movq	%rax, 16(%rsp)
	jmp	.LBB18_6
.LBB18_3:
	movq	24(%rsp), %rax
	movq	8(%rsp), %rcx                   # 8-byte Reload
	xchgq	%rax, (%rcx)
	movq	%rax, 16(%rsp)
	jmp	.LBB18_6
.LBB18_4:
	movq	24(%rsp), %rax
	movq	8(%rsp), %rcx                   # 8-byte Reload
	xchgq	%rax, (%rcx)
	movq	%rax, 16(%rsp)
	jmp	.LBB18_6
.LBB18_5:
	movq	24(%rsp), %rax
	movq	8(%rsp), %rcx                   # 8-byte Reload
	xchgq	%rax, (%rcx)
	movq	%rax, 16(%rsp)
.LBB18_6:
	movq	16(%rsp), %rax
	addq	$56, %rsp
	retq
	.p2align	2, 0x90

I'm not the best at looking at assembly, but I think xchgq %rax, (%rcx) exchanges rax with the memory at pointer location rcx. I read through some docs here: https://www.felixcloutier.com/x86/xchg

The operands can be two general-purpose registers or a register and a memory location. If a memory operand is referenced, the processor’s locking protocol is automatically implemented for the duration of the exchange operation, regardless of the presence or absence of the LOCK prefix or of the value of the IOPL. (See the LOCK prefix description in this chapter for more information on the locking protocol.)

So, bad news. But again, we're already doing this for reference counts and I'm not aware of any performance issues with that. I can profile this too if needed :) Just let me know.

RandomShaper · 2021-02-25T22:49:12Z

When I said "no barriers" I meant no compiler barriers and no enforcement of CPU barriers. However, that's a minimum requirement from the programmer and the CPU still has to do what it has to do; i.e. do the closest equal-or-safer thing it can do to at least meet the expectations (like if you explicitly issue some kind of barrier not considered by some CPU instruction set, it may have to use a full barrier to at least provide the expected guarantees).

In our case, yes, XCHG in x86 has that implicit LOCK prefix, which keeps others' hands out of the data until the full operation is complete, and it turns out to be in practice a full memory barrier (there are some subtleties, but they are beyond my current knowledge).

In conclusion, the barrier is a must on x86 (64) and I guess we can live with it. I don't think it will hinder performance so much that it becomes a problem and, anyway, this is in harmony to our current leaning towards a thread safer Godot.

P. S.: At least we can ensure the operation is lock-free, which would be worse than the memory barrier.

ellenhp · 2021-02-25T22:53:27Z

Agreed. I just wanted to bring this up because it theoretically could impact performance. I don't think it will in a serious way but it would be dishonest to not mention it. :)

reduz · 2021-03-15T17:13:50Z

We should most likely measure this somehow before merging. IMO this protects against the same reference being accesed from different threads, which is not exactly what the refcount was meant for, and not a common case along the engine. Maybe this could be special case where it is an actual problema.

fire added the bug label Feb 21, 2021

fire requested review from a team and RandomShaper February 21, 2021 23:16

ellenhp mentioned this pull request Feb 21, 2021

Make Ref<> assignments atomic godotengine/godot-proposals#2330

Closed

ellenhp marked this pull request as ready for review February 21, 2021 23:27

YuriSizov added the topic:core label Feb 21, 2021

YuriSizov added this to the 4.0 milestone Feb 21, 2021

ellenhp force-pushed the atomic_ref_assignment branch from 8c58381 to bdb61bb Compare February 23, 2021 01:27

RandomShaper approved these changes Feb 23, 2021

View reviewed changes

RandomShaper approved these changes Feb 24, 2021

View reviewed changes

akien-mga requested a review from reduz February 24, 2021 09:46

Make Ref assignment atomic

8c53652

ellenhp force-pushed the atomic_ref_assignment branch from bdb61bb to 8c53652 Compare February 25, 2021 04:40

ellenhp closed this Jul 30, 2021

akien-mga added the archived label Jul 30, 2021

jitspoe mentioned this pull request Nov 11, 2021

Random crash in AudioStreamPlaybackRandomPitch::start / AudioStreamPlayer3D::_mix_audio #54866

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Ref assignment atomic #46302

Make Ref assignment atomic #46302

ellenhp commented Feb 21, 2021 •

edited by fire

Loading

ellenhp commented Feb 22, 2021 •

edited

Loading

ellenhp commented Feb 22, 2021

RandomShaper commented Feb 22, 2021

ellenhp commented Feb 23, 2021

RandomShaper left a comment

ellenhp commented Feb 24, 2021

RandomShaper commented Feb 24, 2021

reduz commented Feb 24, 2021 •

edited

Loading

RandomShaper commented Feb 24, 2021

ellenhp commented Feb 24, 2021 •

edited

Loading

ellenhp commented Feb 25, 2021 •

edited

Loading

RandomShaper commented Feb 25, 2021 •

edited

Loading

ellenhp commented Feb 25, 2021

reduz commented Mar 15, 2021

Make Ref assignment atomic #46302

Make Ref assignment atomic #46302

Conversation

ellenhp commented Feb 21, 2021 • edited by fire Loading

ellenhp commented Feb 22, 2021 • edited Loading

ellenhp commented Feb 22, 2021

RandomShaper commented Feb 22, 2021

ellenhp commented Feb 23, 2021

RandomShaper left a comment

Choose a reason for hiding this comment

ellenhp commented Feb 24, 2021

RandomShaper commented Feb 24, 2021

reduz commented Feb 24, 2021 • edited Loading

RandomShaper commented Feb 24, 2021

ellenhp commented Feb 24, 2021 • edited Loading

ellenhp commented Feb 25, 2021 • edited Loading

RandomShaper commented Feb 25, 2021 • edited Loading

ellenhp commented Feb 25, 2021

reduz commented Mar 15, 2021

ellenhp commented Feb 21, 2021 •

edited by fire

Loading

ellenhp commented Feb 22, 2021 •

edited

Loading

reduz commented Feb 24, 2021 •

edited

Loading

ellenhp commented Feb 24, 2021 •

edited

Loading

ellenhp commented Feb 25, 2021 •

edited

Loading

RandomShaper commented Feb 25, 2021 •

edited

Loading