Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static BitArray #412

Open
PythonNut opened this issue May 23, 2018 · 9 comments
Open

Static BitArray #412

PythonNut opened this issue May 23, 2018 · 9 comments
Labels
feature features and feature requests

Comments

@PythonNut
Copy link

Since this hasn't really been brought up before, I would like to suggest the possibility of Static BitArrays. In particular, since the performance of static arrays relative to regular arrays are so closely tied to their size, the space optimization afforded by BitArrays may allow much larger arrays to reap performance gains.

For example, a bitboard for Chess would require only 64 bits, so it could fit in a single register, discounting overhead.

@andyferris
Copy link
Member

Sure... this should be fine.

I'll also note that such a type could be developed outside of StaticArrays as a subtype of StaticArray (at least originally - I have nothing against inclusion here except to note that Julia makes working with smaller packages pretty painless).

@andyferris
Copy link
Member

Also, for 8x8 in particular I'd note a less generic approach that takes advantage of the fact that byes are 8 bits would probably be optimal (e.g. faster 2D indexing?)

@chethega
Copy link
Contributor

Regarding static BitVectors, I'd like to see them "effectively" big-endian, in the sense that

@inline function getbit(B::SVector{N,UInt64},i) where N
    i1, i2 = Base.get_chunks_id(i)
    u = 0x8000000000000000 >> i2
    r = (B[i1] & u) != 0
    return r
end

instead of shifting 1 to the left. The reason is that we really want to use the integer comparison operations in order to compare bitvectors, i.e. the logical fist bit needs to be the MSB of the first UInt64. Base BitVector uses a very weird mixed endian format that makes lexicographic comparisons on 64bit chunks bad, no need to replicate that.

@chethega
Copy link
Contributor

I recently needed this, and made an implementation. So, the question would be: Comments? Should I make a PR?

Design Facts:
Because the new fast iteration over set bits requires the Base layout, I stuck with that. In order to save compile time, I decided to encode only the number N of 64-bit chunks in the type (I don't know the length a-priori, but can pay for the occasional dynamic dispatch on N). This means that practical use will often need to carry a mask around.

Convenience container:
Modifying statics is inconvenient. For that reason, I made a BitMat{N} that is essentially a BitMatrix(64*N, k), where bitmat[:,k] returns the right N*64-bits. Now we can conveniently toggle single or multiple bits, and have temporaries live on the stack / register.

Open question:
Do we want to be <:AbstractSet{Int} or <:AbstractVector{Bool}? The first variant is implemented in as BitMatSet.jl, the second in BitMatVec.jl. Being a vector is somehow more honest, but both for iteration and for printing the set view fits more. Always calling for i in iterbits(mask & bv1 & ~bm[: ,k ]) gets old fast.

Unfixed problem:
I really need to go over the boundscheck avoidance. There still are a couple too many boundschecks. If there is interest in a PR then I'll work on that; otherwise I'll just use my current working code until I get unhappy with its performance. The emitted code for generating a BitVec{N}(k) of length N*64 with bit k set is pretty bad for large N; I'm not sure what to do about that (except not using that; you can work this by writing to a matrix, toggling the bit, and reading back).

@difranco
Copy link

difranco commented Mar 17, 2019

I recently needed this, and made an implementation. So, the question would be: Comments? Should I make a PR?

@chethega
Please do! I could use this right now. Might give me an order of magnitude performance and might make the difference in making the stuff I'm working on work in a systems context.

This would be most useful if there were a mutable version as there is for the general static arrays. (I noticed the gists you put up above didn't have setters when I tried to use them, and am having a brief look at adding those now, but will probably just fall back to regular bit arrays for now.)

@chethega
Copy link
Contributor

Please do! I could use this right now.

Sure, tell me what you need. My current working variant is completely divorced from StaticArrays (I duplicate the ntuple games), uses the bitarray endianness and is more like a static bitset. I do some shady stuff to overload setindex!(m::RefValue{SBitSet{N}}, val::Bool, idx::Integer) where N, because replicating the MArray machinery was too annoying (it is a cool machinery, but more complex than I am willing to maintain).

I currently don't think that the AbstractArray{Bool}, or any other Base interface is worthwhile for this type. You want broadcasting as a scalar, iteration over set bits, bitwise operators instead of set language (&, |, xor, ~ instead of unreadably verbose intersect), both array-like getindex/setindex! and set-like in. That plus lack of positive feedback is why my PRs on this are somewhat on hold: I don't see anything getting merged that reaches the convenience of what I already have.

But if you tell me what you need, then I can make you a small gist with the relevant parts of my current implementation.

@difranco
Copy link

@chethega
What I am doing now relies most heavily on sum being implemented via popcount, and mutability. The rest would just be a bonus. Being able to write vector / matrix A * X + B where the implied scalar multiplications are boolean ANDs and scalar additions are boolean ORs, and implemented with the appropriate machine vector ops, would also be nice.

@c42f c42f added the feature features and feature requests label Jul 31, 2019
@chethega
Copy link
Contributor

@difranco

Look at #647

The pop-count sum is implemented as length. Regarding matrix-vector multiply, I don't have a fast solution yet, nor do I have a type for SBitMatrix. Two competing ideas:

mul_Ab(op, mat, vec)
rs = zero(typeof(vec))
for i in vec
rs = op(rs, mat[i])
end
rs
end

This one basically scales in the number of entries in i, and adds up columns of mat. Second approach is something like

mul_Ab(op, mat, vec)
rs = MBitSet(vec)
for (i,row) in enumerate(mat)
rs[i] = reduce(op, row & vec)
end
rs[]
end

That code is of course terrible, and we would need a faster way of compressing into rs.

I suspect that the first approach will be faster, but has the disadvantage of data-dependent branching. This is very bad in some contexts (e.g. security side-channels).

I have no intention of implementing any bitpacked matmul in the near future; but feel free to propose some code.

@chelate
Copy link

chelate commented Jun 11, 2021

Sorry I'm thinking out loud. This is all obvious to everyone else.

It's possible to make static BitArrays by just by using StaticArray{size,Bool}, but the size of a UInt64 is 8 and SVector{64,Bool} is 64. A factor of 8 could be quite bad for cache efficiency.

You could implement something like a struct of Uints and then do bit manipulations on them directly using readbit and bit_length, but not exactly robust or safe if your manipulations get complicated.

I'm wondering, is it possible to simply replace Vector with SVector in the implemetation of Base.BitArray, to create SBitArray and then subtype everything to work? Here's how base defines BitArrays:

mutable struct BitArray{N} <: AbstractArray{Bool, N}    
   chunks::Vector{UInt64}   
   len::Int    
   dims::NTuple{N,Int}
....

Update well as everyone knows you can't subtype concrete types, okay I'm still learning here. sorry to use github to think outloud. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature features and feature requests
Projects
None yet
Development

No branches or pull requests

6 participants