Revamp `Bitv` and `BitvSet` #15284

apoelstra · 2014-06-30T21:37:37Z

The types Bitv and BitvSet are badly out of date. This PR:

cleans up the code (primarily, simplifies Bitv and implements BitvSet in terms of Bitv)
implements several new traits for Bitv
adds new functionality to Bitv and BitvSet
replaces internal iterators with external ones
updates documentation
minor bug fixes

This is a significantly souped-up version of PR #15139 and is the result of the discussion there.

alexcrichton · 2014-07-02T00:51:59Z

src/libcollections/bitv.rs

+        let ret = self.iter.next().map(|&n| (self.offset, n));
+        self.offset += 1;
+        ret
+    }
 }


Could this yield uint and be paired with enumerate() and skip()?

It could. It originally was, but was paired with enumerate 100% of the time. Pulling enumerate out would force us to use skip in the cases that a nonzero offset is needed, and this is potentially a bad perf hit since skipping is O(n) while just storing the offset is O(1).

Also note that this iterator is an abstraction leak, so it will remain private to the module.

emberian · 2014-07-02T00:54:14Z

At a first pass, this all looks ok, except apoelstra@962e1b6. Saying "if there is one, I didn't check" basically means "no", to me. Making changes with performance impliciations with no measurements is not something we should do. I'm fine with removing the separation if it's true that 1. there are no users depending on this, 2. the performance impact is actually small.

alexcrichton · 2014-07-02T00:54:53Z

src/libcollections/bitv.rs

+        let ret = self.get(self.nbits - 1);
+        self.nbits -= 1;
+        ret
+    }


Elsewhere care seems to be taken to zero-out any bits beyond the end, is that applicable here as well?

Yes. This is a bug. I'll fix it in my next push.

Actually, I will remove the care to zero out bits beyond the end, since this occurs in many places, and is easy to forget but hard to test for. Masking bugs matter only to BitvSet, and in one place:
when you insert() a new element, nbits is increased so extraneous 1's may suddenly wrongly be considered to be members of the set.

Having said that, care still needs to be taken that any entirely-unused words are all-zeroes. So anywhere that nbits is dropped I will need to take a look. This might be the only location.

I'll this function instead so that it can handle extraneous 1's in the trailing word. Then masking is only required on reads, not writes, and I can move every occurrence of big_mask into Words::next() as @huonw suggests below.

alexcrichton · 2014-07-02T01:00:11Z

A few high-level points:

Is there still any utility to having the distinction between Bitv and BitvSet?
Could the bitv constructor just be Bitv::new() and perhaps add a Bitv::with_capacity() ?
This does unfortunately mean that small bit vectors require a heap allocation. I'm sure there's a microbenchmark which will show that this is a perf hit, but that may not matter too too much. Just wanted to bring this up as a point.

apoelstra · 2014-07-02T01:54:09Z

@cmr I'll do some benchmarks and see what happens. As for usage, I'm not aware of any performance-sensitive use of small Bitvs in the wild (there are none in rustc or in my code) and I'm not sure where I should look to find some.

@alexcrichton Agreed, I should change the constructors. As for the distinction between Bitv and BitvSet, I think it's worthwhile because BitvSet behaves as though it were unbounded (getting past the end will simply return "element not in set" while setting past the end will extend the underlying array). The two types are conceptually very different and I think we should keep the separation in code.

emberian · 2014-07-02T02:01:11Z

If rustc doesn't use it, I'm convinced. I still want to know what the
performance impact is.

On Tue, Jul 1, 2014 at 6:54 PM, Andrew Poelstra notifications@github.com
wrote:

@cmr https://github.com/cmr I'll do some benchmarks and see what
happens. As for usage, I'm not aware of any performance-sensitive use of
small Bitvs in the wild (there are none in rustc or in my code) and I'm
not sure where I should look to find some.

@alexcrichton https://github.com/alexcrichton Agreed, I should change
the constructors. As for the distinction between Bitv and BitvSet, I
think it's worthwhile because BitvSet behaves as though it were unbounded
(getting past the end will simply return "element not in set" while setting
past the end will extend the underlying array). The two types are
conceptually very different and I think we should keep the separation in
code.

—
Reply to this email directly or view it on GitHub
#15284 (comment).

http://octayn.net/

huonw · 2014-07-02T03:45:59Z

src/libcollections/bitv.rs

@@ -350,16 +173,19 @@ impl Bitv {
     * changed
     */
    #[inline]
-    pub fn assign(&mut self, v: &Bitv) -> bool { self.do_op(Assign, v) }
+    pub fn assign(&mut self, other: &Bitv) -> bool {


This is similar to Clone.clone_from, although that would want to be implemented without the same length restriction.

huonw · 2014-07-02T04:00:54Z

big_mask is called a lot, especially in conjunction with self.storage.iter(). I wonder if there's some value in Words-like iterator that includes the mask, so e.g. all would be self.mask_words().all(|(mask, word)| word & mask == mask). The iterator could then be updating the mask incrementally, instead of recomputing it each time (or, it could understand that the only time there won't be a full !0 mask is for the last word, and so treat it specially).

The old `Bitv` structure had two variations: one represented by a vector of uints, and another represented by a single uint for bit vectors containing fewer than uint::BITS bits. The purpose of this is to avoid the indirection of using a Vec, but the speedup is only available to users who (a) are storing less than uints::BITS bits (b) know this when they create the vector (since `Bitv`s cannot be resized) (c) don't know this at compile time (else they could use uint directly) Giving such specific users a (questionable) speed benefit at the cost of adding explicit checks to almost every single bit call, frequently writing the same method twice and making iteration much much more difficult, does not seem like a worthwhile tradeoff to me. Also, rustc does not use Bitv anywhere, only through BitvSet, which does not have this optimization. For reference, here is some speed data from before and after this PR: BEFORE: test bitv::tests::bench_bitv_big ... bench: 4 ns/iter (+/- 1) test bitv::tests::bench_bitv_big_iter ... bench: 4858 ns/iter (+/- 22) test bitv::tests::bench_bitv_big_union ... bench: 507 ns/iter (+/- 35) test bitv::tests::bench_bitv_set_big ... bench: 6 ns/iter (+/- 1) test bitv::tests::bench_bitv_set_small ... bench: 6 ns/iter (+/- 0) test bitv::tests::bench_bitv_small ... bench: 5 ns/iter (+/- 1) test bitv::tests::bench_bitvset_iter ... bench: 12930 ns/iter (+/- 662) test bitv::tests::bench_btv_small_iter ... bench: 39 ns/iter (+/- 1) test bitv::tests::bench_uint_small ... bench: 4 ns/iter (+/- 1) AFTER: test bitv::tests::bench_bitv_big ... bench: 5 ns/iter (+/- 1) test bitv::tests::bench_bitv_big_iter ... bench: 5004 ns/iter (+/- 102) test bitv::tests::bench_bitv_big_union ... bench: 356 ns/iter (+/- 26) test bitv::tests::bench_bitv_set_big ... bench: 6 ns/iter (+/- 0) test bitv::tests::bench_bitv_set_small ... bench: 6 ns/iter (+/- 1) test bitv::tests::bench_bitv_small ... bench: 4 ns/iter (+/- 1) test bitv::tests::bench_bitvset_iter ... bench: 12918 ns/iter (+/- 621) test bitv::tests::bench_btv_small_iter ... bench: 50 ns/iter (+/- 5) test bitv::tests::bench_uint_small ... bench: 4 ns/iter (+/- 1)

apoelstra · 2014-07-02T19:35:12Z

Thanks for the comments guys. I have submitted a new pull which takes all of your suggestions into account:

BitvSet::compactify is now BitvSet::shrink_to_fit() and is public. It is no longer called from BitvSet::remove; that is the user's responsibility.
BitvSet::grow is removed. We now have the new BitvSet::reserve and Bitv::reserve which are public.
The masking rules for Bitvs containing a number of bits not a multiple of uint::BITS are explained in a7b31a9e, are handled consistently, and unit tested. In particular Bitv::word_iter is now Bitv::mask_words and does masking itself.
MaskWords still does the job of enumerate and skip, because it can do so more efficiently.
Performance data is included in the commit message of 337a75f8. What it shows is that not much has changed except that Bitv::iter is slowed down for small Bitvs (as a consequence of Bitv::get needing to do an indirection) and Bitv::union is sped up (the old code had an indirection involving an Op enum to choose the right function, I guess to avoid code repetition in a world without macros).
Bitv::each_storage is removed since it had been reduced to a pointless indirection
Bitv::extend and from_iterator preallocate space.
BitvSet::is_subset and is_superset are substantially shortened and unit tested
4-space tabs everywhere :)
Bitv::new is renamed to Bitv::with_capacity, and the new Bitv::new creates an empty Bitv. There is also a BitvSet::with_capacity which preallocates space.
Bitv::assign is removed since it is just a less-general version of Clone::clone_from. Clone is implemented explicitly to reuse space in clone_from if possible.
iter::all() is used in several places that explicit for/return loops had been.

The internal masking behaviour for `Bitv` is now defined as: - Any entirely words in self.storage must be all zeroes. - Any partially used words may have anything at all in their unused bits. This means: - When decreasing self.nbits, care must be taken that any no-longer-used words are zeroed out. - When increasing self.nbits, care must be taken that any newly-unmasked bits are set to their correct values. - When reading words, care should be taken that the values of unused bits are not used. (Preferably, use `Bitv::mask_words` which zeroes them out for you.) The old behaviour was that every unused bit was always set to zero. The problem with this is that unused bits are almost never read, so forgetting to do this will result in very subtle and hard-to-track down bugs. This way the responsibility for masking falls on the places which might cause unused bits to be read: for now, this is only `Bitv::mask_words` and `BitvSet::insert`.

The argument passed to Vec::grow is the number of elements to grow the vector by, not the target number of elements. The old `Bitv` code did the wrong thing, allocating more memory than it needed to.

Removes the following methods from `Bitv`: - `to_vec`: translates a `Bitv` into a bulky `Vec<uint>` of 0's and 1's replace with: `bitv.iter().map(|b| if b {1} else {0}).collect()` - `to_bools`: translates a `Bitv` into a `Vec<bool>` replace with: `bitv.iter().collect()` - `ones`: internal iterator over all 1 bits in a `Bitv` replace with: `BitvSet::from_bitv(bitv).iter().advance(fn)` These methods had specific functionality which can be replicated more generally by the modern iterator system. (Also `to_vec` was not even unit tested!)

Add documentation to methods on BitvSet that were missing them. Also make sure #[inline] is on all methods that are (a) one-liners or (b) private methods whose only purpose is code deduplication.

On Bitv: - Add .push() and .pop() which take and return bool, respectively - Add .truncate() which truncates a Bitv to a specific length - Add .grow() which grows a Bitv by a specific length - Add .reserve() which grows the underlying storage to be able to hold a specified number of bits without resizing - Implement FromIterator<Vec<bool>> - Implement Extendable<bool> - Implement Collection - Implement Mutable - Remove .from_bools() since FromIterator<Vec<bool>> now accomplishes this. - Remove .assign() since Clone::clone_from() accomplishes this. On BitvSet: - Add .reserve() which grows the underlying storage to be able to hold a specified number of bits without resizing - Add .get_ref() and .get_mut_ref() to return references to the underlying Bitv

`Bitv::new` has been renamed `Bitv::with_capacity`. The new function `Bitv::new` now creates a `Bitv` with no elements. The new function `BitvSet::with_capacity` creates a `BitvSet` with a specified capacity.

The types `Bitv` and `BitvSet` are badly out of date. This PR: - cleans up the code (primarily, simplifies `Bitv` and implements `BitvSet` in terms of `Bitv`) - implements several new traits for `Bitv` - adds new functionality to `Bitv` and `BitvSet` - replaces internal iterators with external ones - updates documentation - minor bug fixes This is a significantly souped-up version of PR #15139 and is the result of the discussion there.

Enable cfg miri in analysis

alexcrichton reviewed Jul 2, 2014
View reviewed changes

huonw reviewed Jul 2, 2014
View reviewed changes

apoelstra and others added 2 commits July 2, 2014 12:34

collections::bitv: implement BitvSet directly as a Bitv

a4c0468

apoelstra added 8 commits July 2, 2014 12:36

collections::bitv: correct use of Vec<T>::grow

7a7ae99

The argument passed to Vec::grow is the number of elements to grow the vector by, not the target number of elements. The old `Bitv` code did the wrong thing, allocating more memory than it needed to.

collections::bitv: replace internal iterators with external ones

a7f335a

collections::bitv: Add documentation and #[inline]'s

b5c54df

Add documentation to methods on BitvSet that were missing them. Also make sure #[inline] is on all methods that are (a) one-liners or (b) private methods whose only purpose is code deduplication.

collections::bitv: change constructors for Bitv and BitvSet

78b6741

`Bitv::new` has been renamed `Bitv::with_capacity`. The new function `Bitv::new` now creates a `Bitv` with no elements. The new function `BitvSet::with_capacity` creates a `BitvSet` with a specified capacity.

collections::bitv: clean up and unit test BitvSet::is_subset

8ef0165

bors closed this Jul 5, 2014

bors merged commit 8ef0165 into rust-lang:master Jul 5, 2014

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 17, 2023

Auto merge of rust-lang#15284 - HKalbasi:mir, r=HKalbasi

789dfd2

Enable cfg miri in analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revamp `Bitv` and `BitvSet` #15284

Revamp `Bitv` and `BitvSet` #15284

Uh oh!

apoelstra commented Jun 30, 2014

Uh oh!

alexcrichton Jul 2, 2014

Uh oh!

apoelstra Jul 2, 2014

Uh oh!

emberian commented Jul 2, 2014

Uh oh!

alexcrichton Jul 2, 2014

Uh oh!

apoelstra Jul 2, 2014

Uh oh!

apoelstra Jul 2, 2014

Uh oh!

alexcrichton commented Jul 2, 2014

Uh oh!

apoelstra commented Jul 2, 2014

Uh oh!

emberian commented Jul 2, 2014

Uh oh!

huonw Jul 2, 2014

Uh oh!

huonw commented Jul 2, 2014

Uh oh!

apoelstra commented Jul 2, 2014

Uh oh!

Uh oh!

Revamp Bitv and BitvSet #15284

Revamp Bitv and BitvSet #15284

Uh oh!

Conversation

apoelstra commented Jun 30, 2014

Uh oh!

alexcrichton Jul 2, 2014

Choose a reason for hiding this comment

Uh oh!

apoelstra Jul 2, 2014

Choose a reason for hiding this comment

Uh oh!

emberian commented Jul 2, 2014

Uh oh!

alexcrichton Jul 2, 2014

Choose a reason for hiding this comment

Uh oh!

apoelstra Jul 2, 2014

Choose a reason for hiding this comment

Uh oh!

apoelstra Jul 2, 2014

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Jul 2, 2014

Uh oh!

apoelstra commented Jul 2, 2014

Uh oh!

emberian commented Jul 2, 2014

Uh oh!

huonw Jul 2, 2014

Choose a reason for hiding this comment

Uh oh!

huonw commented Jul 2, 2014

Uh oh!

apoelstra commented Jul 2, 2014

Uh oh!

Uh oh!

Revamp `Bitv` and `BitvSet` #15284

Revamp `Bitv` and `BitvSet` #15284