Skip to content

Revamp Bitv and BitvSet #15284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 5, 2014
Merged

Revamp Bitv and BitvSet #15284

merged 10 commits into from
Jul 5, 2014

Conversation

apoelstra
Copy link
Contributor

The types Bitv and BitvSet are badly out of date. This PR:

  • cleans up the code (primarily, simplifies Bitv and implements BitvSet in terms of Bitv)
  • implements several new traits for Bitv
  • adds new functionality to Bitv and BitvSet
  • replaces internal iterators with external ones
  • updates documentation
  • minor bug fixes

This is a significantly souped-up version of PR #15139 and is the result of the discussion there.

let ret = self.iter.next().map(|&n| (self.offset, n));
self.offset += 1;
ret
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this yield uint and be paired with enumerate() and skip()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could. It originally was, but was paired with enumerate 100% of the time. Pulling enumerate out would force us to use skip in the cases that a nonzero offset is needed, and this is potentially a bad perf hit since skipping is O(n) while just storing the offset is O(1).

Also note that this iterator is an abstraction leak, so it will remain private to the module.

@emberian
Copy link
Member

emberian commented Jul 2, 2014

At a first pass, this all looks ok, except apoelstra@962e1b6. Saying "if there is one, I didn't check" basically means "no", to me. Making changes with performance impliciations with no measurements is not something we should do. I'm fine with removing the separation if it's true that 1. there are no users depending on this, 2. the performance impact is actually small.

let ret = self.get(self.nbits - 1);
self.nbits -= 1;
ret
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere care seems to be taken to zero-out any bits beyond the end, is that applicable here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is a bug. I'll fix it in my next push.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I will remove the care to zero out bits beyond the end, since this occurs in many places, and is easy to forget but hard to test for. Masking bugs matter only to BitvSet, and in one place:
when you insert() a new element, nbits is increased so extraneous 1's may suddenly wrongly be considered to be members of the set.

Having said that, care still needs to be taken that any entirely-unused words are all-zeroes. So anywhere that nbits is dropped I will need to take a look. This might be the only location.

I'll this function instead so that it can handle extraneous 1's in the trailing word. Then masking is only required on reads, not writes, and I can move every occurrence of big_mask into Words::next() as @huonw suggests below.

@alexcrichton
Copy link
Member

A few high-level points:

  • Is there still any utility to having the distinction between Bitv and BitvSet?
  • Could the bitv constructor just be Bitv::new() and perhaps add a Bitv::with_capacity() ?
  • This does unfortunately mean that small bit vectors require a heap allocation. I'm sure there's a microbenchmark which will show that this is a perf hit, but that may not matter too too much. Just wanted to bring this up as a point.

@apoelstra
Copy link
Contributor Author

@cmr I'll do some benchmarks and see what happens. As for usage, I'm not aware of any performance-sensitive use of small Bitvs in the wild (there are none in rustc or in my code) and I'm not sure where I should look to find some.

@alexcrichton Agreed, I should change the constructors. As for the distinction between Bitv and BitvSet, I think it's worthwhile because BitvSet behaves as though it were unbounded (getting past the end will simply return "element not in set" while setting past the end will extend the underlying array). The two types are conceptually very different and I think we should keep the separation in code.

@emberian
Copy link
Member

emberian commented Jul 2, 2014

If rustc doesn't use it, I'm convinced. I still want to know what the
performance impact is.

On Tue, Jul 1, 2014 at 6:54 PM, Andrew Poelstra notifications@github.com
wrote:

@cmr https://github.com/cmr I'll do some benchmarks and see what
happens. As for usage, I'm not aware of any performance-sensitive use of
small Bitvs in the wild (there are none in rustc or in my code) and I'm
not sure where I should look to find some.

@alexcrichton https://github.com/alexcrichton Agreed, I should change
the constructors. As for the distinction between Bitv and BitvSet, I
think it's worthwhile because BitvSet behaves as though it were unbounded
(getting past the end will simply return "element not in set" while setting
past the end will extend the underlying array). The two types are
conceptually very different and I think we should keep the separation in
code.


Reply to this email directly or view it on GitHub
#15284 (comment).

http://octayn.net/

@@ -350,16 +173,19 @@ impl Bitv {
* changed
*/
#[inline]
pub fn assign(&mut self, v: &Bitv) -> bool { self.do_op(Assign, v) }
pub fn assign(&mut self, other: &Bitv) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to Clone.clone_from, although that would want to be implemented without the same length restriction.

@huonw
Copy link
Member

huonw commented Jul 2, 2014

big_mask is called a lot, especially in conjunction with self.storage.iter(). I wonder if there's some value in Words-like iterator that includes the mask, so e.g. all would be self.mask_words().all(|(mask, word)| word & mask == mask). The iterator could then be updating the mask incrementally, instead of recomputing it each time (or, it could understand that the only time there won't be a full !0 mask is for the last word, and so treat it specially).

apoelstra and others added 2 commits July 2, 2014 12:34
The old `Bitv` structure had two variations: one represented by a vector of
uints, and another represented by a single uint for bit vectors containing
fewer than uint::BITS bits.

The purpose of this is to avoid the indirection of using a Vec, but the
speedup is only available to users who

  (a) are storing less than uints::BITS bits
  (b) know this when they create the vector (since `Bitv`s cannot be resized)
  (c) don't know this at compile time (else they could use uint directly)

Giving such specific users a (questionable) speed benefit at the cost of
adding explicit checks to almost every single bit call, frequently writing
the same method twice and making iteration much much more difficult, does
not seem like a worthwhile tradeoff to me.

Also, rustc does not use Bitv anywhere, only through BitvSet, which does
not have this optimization.

For reference, here is some speed data from before and after this PR:

BEFORE:
test bitv::tests::bench_bitv_big        ... bench:     4 ns/iter (+/- 1)
test bitv::tests::bench_bitv_big_iter   ... bench:  4858 ns/iter (+/- 22)
test bitv::tests::bench_bitv_big_union  ... bench:   507 ns/iter (+/- 35)
test bitv::tests::bench_bitv_set_big    ... bench:     6 ns/iter (+/- 1)
test bitv::tests::bench_bitv_set_small  ... bench:     6 ns/iter (+/- 0)
test bitv::tests::bench_bitv_small      ... bench:     5 ns/iter (+/- 1)
test bitv::tests::bench_bitvset_iter    ... bench: 12930 ns/iter (+/- 662)
test bitv::tests::bench_btv_small_iter  ... bench:    39 ns/iter (+/- 1)
test bitv::tests::bench_uint_small      ... bench:     4 ns/iter (+/- 1)

AFTER:
test bitv::tests::bench_bitv_big        ... bench:     5 ns/iter (+/- 1)
test bitv::tests::bench_bitv_big_iter   ... bench:  5004 ns/iter (+/- 102)
test bitv::tests::bench_bitv_big_union  ... bench:   356 ns/iter (+/- 26)
test bitv::tests::bench_bitv_set_big    ... bench:     6 ns/iter (+/- 0)
test bitv::tests::bench_bitv_set_small  ... bench:     6 ns/iter (+/- 1)
test bitv::tests::bench_bitv_small      ... bench:     4 ns/iter (+/- 1)
test bitv::tests::bench_bitvset_iter    ... bench: 12918 ns/iter (+/- 621)
test bitv::tests::bench_btv_small_iter  ... bench:    50 ns/iter (+/- 5)
test bitv::tests::bench_uint_small      ... bench:     4 ns/iter (+/- 1)
@apoelstra
Copy link
Contributor Author

Thanks for the comments guys. I have submitted a new pull which takes all of your suggestions into account:

  • BitvSet::compactify is now BitvSet::shrink_to_fit() and is public. It is no longer called from BitvSet::remove; that is the user's responsibility.
  • BitvSet::grow is removed. We now have the new BitvSet::reserve and Bitv::reserve which are public.
  • The masking rules for Bitvs containing a number of bits not a multiple of uint::BITS are explained in a7b31a9e, are handled consistently, and unit tested. In particular Bitv::word_iter is now Bitv::mask_words and does masking itself.
  • MaskWords still does the job of enumerate and skip, because it can do so more efficiently.
  • Performance data is included in the commit message of 337a75f8. What it shows is that not much has changed except that Bitv::iter is slowed down for small Bitvs (as a consequence of Bitv::get needing to do an indirection) and Bitv::union is sped up (the old code had an indirection involving an Op enum to choose the right function, I guess to avoid code repetition in a world without macros).
  • Bitv::each_storage is removed since it had been reduced to a pointless indirection
  • Bitv::extend and from_iterator preallocate space.
  • BitvSet::is_subset and is_superset are substantially shortened and unit tested
  • 4-space tabs everywhere :)
  • Bitv::new is renamed to Bitv::with_capacity, and the new Bitv::new creates an empty Bitv. There is also a BitvSet::with_capacity which preallocates space.
  • Bitv::assign is removed since it is just a less-general version of Clone::clone_from. Clone is implemented explicitly to reuse space in clone_from if possible.
  • iter::all() is used in several places that explicit for/return loops had been.

apoelstra added 8 commits July 2, 2014 12:36
The internal masking behaviour for `Bitv` is now defined as:
  - Any entirely words in self.storage must be all zeroes.
  - Any partially used words may have anything at all in their
    unused bits.

This means:
  - When decreasing self.nbits, care must be taken that any
    no-longer-used words are zeroed out.

  - When increasing self.nbits, care must be taken that any
    newly-unmasked bits are set to their correct values.

  - When reading words, care should be taken that the values of
    unused bits are not used. (Preferably, use `Bitv::mask_words`
    which zeroes them out for you.)

The old behaviour was that every unused bit was always set to
zero. The problem with this is that unused bits are almost never
read, so forgetting to do this will result in very subtle and
hard-to-track down bugs. This way the responsibility for masking
falls on the places which might cause unused bits to be read: for
now, this is only `Bitv::mask_words` and `BitvSet::insert`.
The argument passed to Vec::grow is the number of elements to grow
the vector by, not the target number of elements. The old `Bitv`
code did the wrong thing, allocating more memory than it needed to.
Removes the following methods from `Bitv`:

  - `to_vec`: translates a `Bitv` into a bulky `Vec<uint>` of 0's and 1's
    replace with:  `bitv.iter().map(|b| if b {1} else {0}).collect()`

  - `to_bools`: translates a `Bitv` into a `Vec<bool>`
    replace with: `bitv.iter().collect()`

  - `ones`: internal iterator over all 1 bits in a `Bitv`
    replace with: `BitvSet::from_bitv(bitv).iter().advance(fn)`

These methods had specific functionality which can be replicated more
generally by the modern iterator system. (Also `to_vec` was not even
unit tested!)
Add documentation to methods on BitvSet that were missing them. Also
make sure #[inline] is on all methods that are (a) one-liners or (b)
private methods whose only purpose is code deduplication.
On Bitv:
   - Add .push() and .pop() which take and return bool, respectively
   - Add .truncate() which truncates a Bitv to a specific length
   - Add .grow() which grows a Bitv by a specific length
   - Add .reserve() which grows the underlying storage to be able to hold
     a specified number of bits without resizing
   - Implement FromIterator<Vec<bool>>
   - Implement Extendable<bool>
   - Implement Collection
   - Implement Mutable
   - Remove .from_bools() since FromIterator<Vec<bool>> now accomplishes this.
   - Remove .assign() since Clone::clone_from() accomplishes this.

On BitvSet:
   - Add .reserve() which grows the underlying storage to be able to hold
     a specified number of bits without resizing
   - Add .get_ref() and .get_mut_ref() to return references to the
     underlying Bitv
`Bitv::new` has been renamed `Bitv::with_capacity`. The new function
`Bitv::new` now creates a `Bitv` with no elements.

The new function `BitvSet::with_capacity` creates a `BitvSet` with
a specified capacity.
bors added a commit that referenced this pull request Jul 5, 2014
The types `Bitv` and `BitvSet` are badly out of date. This PR:
- cleans up the code (primarily, simplifies `Bitv` and implements `BitvSet` in terms of `Bitv`)
- implements several new traits for `Bitv`
- adds new functionality to `Bitv` and `BitvSet`
- replaces internal iterators with external ones
- updates documentation
- minor bug fixes

This is a significantly souped-up version of PR #15139 and is the result of the discussion there.
@bors bors closed this Jul 5, 2014
@bors bors merged commit 8ef0165 into rust-lang:master Jul 5, 2014
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants