Skip to content

Conversation

@KristofferC
Copy link
Member

@KristofferC KristofferC commented Nov 3, 2019

Fixes #33750

@KristofferC KristofferC added the docs This change adds or pertains to documentation label Nov 3, 2019
@KristofferC
Copy link
Member Author

I don't know if this exposes too much implementation and it is better to just document that mutation in general is not these safe.

@yuyichao
Copy link
Contributor

yuyichao commented Nov 3, 2019

Any concurrent access with at least one write should be treated as racy

@KristofferC
Copy link
Member Author

Thanks, updated the wording.

@KristofferC KristofferC added the parallelism Parallel or distributed computation label Nov 5, 2019
@mbauman mbauman requested a review from yuyichao November 5, 2019 17:21
@yuyichao
Copy link
Contributor

yuyichao commented Nov 5, 2019

I still don't think you should talk about 64. It's not in the type name.

@KristofferC
Copy link
Member Author

It's already in the docstring though which is why I felt this didn't expose any extra implementation:

BitArrays pack up to 64 values into every 8 bytes, resulting in an 8x space efficiency
over Array{Bool, N} and allowing some operations to work on 64 values at once.

@yuyichao
Copy link
Contributor

yuyichao commented Nov 5, 2019

No that's very different.

Mentioning implementation detail in the doc string is totally fine. In this case, it helps the user to understand what trade-off is being made and what performance characteristic to expect. It's not even something the user should be able to observe.

The new document is completely different though. Now it's documented as something that the user can depend on rather than some implementation detail that's useful for the user to know. If you want to mention 64 in the same way as above, you can say that any index is racing but for the current implementation you can expect these to actually give bad result but there's no guarantee on this.

Or if we exposes the back storage in any other way it'll be more OK to document this as dependable here as well. I believe I mentioned this on discourse 1-2 years ago and at least at that time I didn't find any such public interface.

base/bitarray.jl Outdated
Note that extra care has to be taken with regards to thread safety when
mutating a `BitArray` due to its packed storage format. In particular, `A[i]`
and `A[i+n]` share storage if `n` is small (that is, if `n <= 64 - mod1(i, 64)`) so any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want 0 <= n...

Also, I really don't find this way of expressing the condition easy to follow..... and apparently it's hard to get right either.....

I would just say (i - 1) % 64 == (j - 1) % 64. This way, it should be obvious that the condition is symmetric and has something to do with 64bit blocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want (i - 1) & 64 == (j - 1) & 64 if we're going to talk about indices i and j. I like the +n form just because it emphasizes the "closeness" of the indices and read a bit simpler to me — it's essentially indices within 64 of eachother that are dangerous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i - 1) & 64

& 63 or % 64.

it's essentially indices within 64 of eachother that are dangerous.

Saying exactly this as an conservative statement would be easy to understand.
But it seems that the current document is trying to be exact instead and in that case the distance isn't what matters, it's whether the indices falls into the same chunk. I strongly doubt if many people is going to get this picture from the current wording easily (I can't).

Copy link
Member

@mbauman mbauman Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we can't get this right after many iterations suggests that it's neither easily readable nor writeable. It's & 0xFFFF_FFFF_FFFF_FFC0 — or simpler, ÷ 64 or >> 6. Mod isn't right — that identifies the bit within a chunk. We want the chunk index.

I think. Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we can't get this right after many iterations suggests that it's neither easily readable nor writeable.

Well, not writable maybe, but readable. It's ÷ 64.

@KristofferC
Copy link
Member Author

Alright, I see the difference. However, commiting to this implementation would make the type more useful in threaded contexts so this is not exposing implementation for nothing.

I guess the question is what level of uncertainty we have in the current storage format of the type. If we know we won't change this, might as well expose it since it allows new uses of the type

@yuyichao
Copy link
Contributor

yuyichao commented Nov 5, 2019

I find it quite strange to expose the internal detail here since it's not even the most useful use of it I think..... (allowing the user to manipulate it direction is a much more useful reason).

I'm not sure how much the size matters although it is still smaller than the cacheline (edited) size everywhere and almost all the supported target support 128bit vectors so I won't be surprized if there's advantage to use larger chunk size for certain range update.

And I was actually a little surprised that there isn't an offset to allow for efficient insertion to the front. This is another implementation detail newly exposed. The previous document actually allows for an offset for the first element whereas the new one forbid that.

@mbauman
Copy link
Member

mbauman commented Nov 5, 2019

And I was actually a little surprised that there isn't an offset to allow for efficient insertion to the front. This is another implemental detail newly exposed. The previous document actually allows for an offset for the first element whereas the new one forbid that.

Are you thinking of BitSets?

I'd support simplifying this to remove the 64 bit considerations — just saying that concurrent accesses with at least one write is dangerous because multiple locations can get packed into the same storage seems sufficient.

@rfourquet
Copy link
Member

Are you thinking of BitSets?

The same mechanism might be implemented for BitArray, i.e. inserting loads of zeros at the beginning might be as simple as updating an offset.

@yuyichao
Copy link
Contributor

yuyichao commented Nov 5, 2019

I'd support simplifying this to remove the 64 bit considerations — just saying that concurrent accesses with at least one write is dangerous because multiple locations can get packed into the same storage seems sufficient.

Right, this is exactly what I'm saying.

i.e. inserting loads of zeros at the beginning might be as simple as updating an offset.

I didn't know how BitSet was implemented but yes this is exactly what I was saying.... Such offset existed in Array so I was expecting BitArray to have it too......

@ViralBShah
Copy link
Member

Bump. Close or update and merge?

base/bitarray.jl Outdated
Note that extra care has to be taken with regards to thread safety when
mutating a `BitArray` due to its packed storage format. In particular, `A[i]`
and `A[i+n]` share storage if `n` is small (that is, if `n <= 64 - mod1(i, 64)`) so any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and `A[i+n]` share storage if `n` is small (that is, if `n <= 64 - mod1(i, 64)`) so any
and `A[j]` share storage if the indices are close (that is, if `(i - 1) ÷ 64 == (j - 1) ÷ 64`) so any

@KristofferC
Copy link
Member Author

I changed this along the lines that @yuyichao wanted (just saying that any concurrent access with a write is not thread-safe).

@ViralBShah
Copy link
Member

Is this ready to merge?

@KristofferC
Copy link
Member Author

Ready from my p.o.v.

@ViralBShah
Copy link
Member

Made it a !!! note, and added some punctuation to make it readable. Let's merge.

@ViralBShah ViralBShah merged commit 3637c9e into master Apr 30, 2020
@ViralBShah ViralBShah deleted the kc/bitarray_ts branch April 30, 2020 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs This change adds or pertains to documentation parallelism Parallel or distributed computation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document that BitArray isn't threadsafe

7 participants