Skip to content

Conversation

@araujoms
Copy link

No description provided.

Comment on lines +98 to +99
q.inv = q⁻¹
q⁻¹.inv = q
Copy link

@vtjnash vtjnash Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did have a bad mistake here (of generally the error sort, though generally not the crash sort), since you assigned these in the wrong order, the :sequentially_consistent annotation is invalidated everywhere and invalid.

Suggested change
q.inv = q⁻¹
q⁻¹.inv = q
@atomic q⁻¹.inv = q
@atomic q.inv = q⁻¹

σ.inv = σ
else
σ⁻¹ = typeof(σ)(invperm.images); check = false)
# we don't want to end up with two copies of inverse σ floating around
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not valid (doesn't describe the code / not physically possible). There is an atomic annotation for this (@atomiconce), but that doesn't actually fix the physics problem of not being able to do 2 writes simultaneously

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash thanks for the comments!

Yeah, now after you pointed this, it seems obvious that I should have done

@atomic σ⁻¹.inv = σ # fully sets up the "local inverse"
@atomic σ.inv = σ⁻¹ # makes the "local inverse" visible to the global world.

so that σ is updated (atomically) only once 🤦🏻

I guess reading of σ.inv below should be also @atomic?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't super clear what you want or need here. I think you're just asking for a release store, since the reverse order has the opposite problem since they reference each other. Otherwise you'd have to use a global mutex (to actually implement the comment) and I'm unclear that you actually want that behavior? You also must use atomiconce macro if you want this only happen once, otherwise it will be atomic, but happen potentially as many times as there are threads

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine this happening multiple times, cause once the field is defined there will be only reads to it;

What I want to avoid is the situation where isdefined returns true, but the internal data structures are read in a not fully-initialized state (I have observed such bugs before). Here the other order could mean that another comes to p::Perm and sees that p.inv is defined so just takes it, but the thread computing p⁻¹ has not fully initialized it. Here it will only result in yet another computation of p⁻¹.inv. In other cases I got some random "Access to undefined reference" despite asking for isdefined.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cause once the field is defined there will be only reads to it;

That is not quite what you implemented with @atomic, since you've only implemented that per-thread, so once the field is defined on a given thread, there will only be reads to it on that thread, but there will still be writes to it on all the other threads since you didn't specify that it should only be done once

Copy link
Owner

@kalmarek kalmarek Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine if two/many threads enter the body of if isdefined and then even overwrite the field, as long as the data in the field is consistent. I understand that I could do @atomiconce p.inv = p⁻¹ to specifically tell the "late" threads not to touch the field. Maybe that'd even better. But since all of them should compute exactly the same thing I don't think it makes a difference in this case.

To put it differently: Are these statements correct:

  1. Once the field of the object is defined any thread that has not yet entered the body of if isdefined(...) will not enter it (and hence will neither perform the computations nor overwrite the field).
  2. Any thread that has entered the body of if defined(...) before the initialization of the field will compute and override the field.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds like what you get from marking it with sequential consistency (though it can be slower than release-acquire which is all you really need). Though even without any atomic annotations, reference fields get implemented with release-consume, so these atomics don't appear that they would make a difference to the observable behaviors

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the clarification!
What do you suggest then instead of sequential_consistent?

    if isdefined(p, :inv, :acquire)
        inv_imgs = copy(@atomic(p.inv).images)
        q⁻¹ = typeof(p)(inv_imgs; check = false)
        @atomic q⁻¹.inv = q # which ordering here?
        @atomiconce :release q.inv = q⁻¹
    end

Hmm, I'm not sure again if I understand how this could be achieved without atomics. e.g. I had situation where on lazy initializations I was getting "Access to undefined field" despite using if isdefined(...). It turned out then that some of the fields were read with nullptrs despite isdefined returning true. These errors disappeared either by using locking (in some cases), or by using @atomic here...

My understanding of the fine details of multithreading is lacking, so I appreciate all of the comments of yours. They help me to clarify things in my head, thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:release or :monotonic would be fine there, since that is not globally visible memory. You might get a torn read/write if it is a struct that can be inlined which can cause things to get into an inconsistent state. Using @atomic explicitly is definitely more correct and better, even if it happens compiles down to the same cpu instructions though. I just note it so you can be alert to whether there are other possible synchronization issues that are being hidden by the seq_cst just making it run slower and being more rare to observe.

kalmarek added a commit that referenced this pull request Jun 22, 2025
@kalmarek kalmarek mentioned this pull request Jun 22, 2025
kalmarek added a commit that referenced this pull request Jun 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants