-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
document some of the thread safety issues with BitArray #33754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I don't know if this exposes too much implementation and it is better to just document that mutation in general is not these safe. |
|
Any concurrent access with at least one write should be treated as racy |
|
Thanks, updated the wording. |
0a6188a to
0d21455
Compare
|
I still don't think you should talk about 64. It's not in the type name. |
|
It's already in the docstring though which is why I felt this didn't expose any extra implementation:
|
|
No that's very different. Mentioning implementation detail in the doc string is totally fine. In this case, it helps the user to understand what trade-off is being made and what performance characteristic to expect. It's not even something the user should be able to observe. The new document is completely different though. Now it's documented as something that the user can depend on rather than some implementation detail that's useful for the user to know. If you want to mention 64 in the same way as above, you can say that any index is racing but for the current implementation you can expect these to actually give bad result but there's no guarantee on this. Or if we exposes the back storage in any other way it'll be more OK to document this as dependable here as well. I believe I mentioned this on discourse 1-2 years ago and at least at that time I didn't find any such public interface. |
base/bitarray.jl
Outdated
| Note that extra care has to be taken with regards to thread safety when | ||
| mutating a `BitArray` due to its packed storage format. In particular, `A[i]` | ||
| and `A[i+n]` share storage if `n` is small (that is, if `n <= 64 - mod1(i, 64)`) so any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want 0 <= n...
Also, I really don't find this way of expressing the condition easy to follow..... and apparently it's hard to get right either.....
I would just say (i - 1) % 64 == (j - 1) % 64. This way, it should be obvious that the condition is symmetric and has something to do with 64bit blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want (i - 1) & 64 == (j - 1) & 64 if we're going to talk about indices i and j. I like the +n form just because it emphasizes the "closeness" of the indices and read a bit simpler to me — it's essentially indices within 64 of eachother that are dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i - 1) & 64
& 63 or % 64.
it's essentially indices within 64 of eachother that are dangerous.
Saying exactly this as an conservative statement would be easy to understand.
But it seems that the current document is trying to be exact instead and in that case the distance isn't what matters, it's whether the indices falls into the same chunk. I strongly doubt if many people is going to get this picture from the current wording easily (I can't).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we can't get this right after many iterations suggests that it's neither easily readable nor writeable. It's & 0xFFFF_FFFF_FFFF_FFC0 — or simpler, ÷ 64 or >> 6. Mod isn't right — that identifies the bit within a chunk. We want the chunk index.
I think. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we can't get this right after many iterations suggests that it's neither easily readable nor writeable.
Well, not writable maybe, but readable. It's ÷ 64.
|
Alright, I see the difference. However, commiting to this implementation would make the type more useful in threaded contexts so this is not exposing implementation for nothing. I guess the question is what level of uncertainty we have in the current storage format of the type. If we know we won't change this, might as well expose it since it allows new uses of the type |
|
I find it quite strange to expose the internal detail here since it's not even the most useful use of it I think..... (allowing the user to manipulate it direction is a much more useful reason). I'm not sure how much the size matters although it is still smaller than the cacheline (edited) size everywhere and almost all the supported target support 128bit vectors so I won't be surprized if there's advantage to use larger chunk size for certain range update. And I was actually a little surprised that there isn't an offset to allow for efficient insertion to the front. This is another implementation detail newly exposed. The previous document actually allows for an offset for the first element whereas the new one forbid that. |
Are you thinking of I'd support simplifying this to remove the 64 bit considerations — just saying that concurrent accesses with at least one write is dangerous because multiple locations can get packed into the same storage seems sufficient. |
The same mechanism might be implemented for |
Right, this is exactly what I'm saying.
I didn't know how |
|
Bump. Close or update and merge? |
base/bitarray.jl
Outdated
| Note that extra care has to be taken with regards to thread safety when | ||
| mutating a `BitArray` due to its packed storage format. In particular, `A[i]` | ||
| and `A[i+n]` share storage if `n` is small (that is, if `n <= 64 - mod1(i, 64)`) so any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| and `A[i+n]` share storage if `n` is small (that is, if `n <= 64 - mod1(i, 64)`) so any | |
| and `A[j]` share storage if the indices are close (that is, if `(i - 1) ÷ 64 == (j - 1) ÷ 64`) so any |
|
I changed this along the lines that @yuyichao wanted (just saying that any concurrent access with a write is not thread-safe). |
1eba02e to
7c5b236
Compare
|
Is this ready to merge? |
|
Ready from my p.o.v. |
|
Made it a |
Fixes #33750