-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Aleksandr Bezobchuk
committed
Dec 30, 2018
1 parent
b69335e
commit afc62e9
Showing
2 changed files
with
87 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
.PHONY: test bench build | ||
|
||
all: test bench build | ||
|
||
tests: | ||
@cargo test | ||
|
||
bench: | ||
@cargo bench | ||
|
||
build: | ||
@cargo build |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# rsbloom | ||
|
||
A simple implementation of a Bloom filter, a space-efficient probabilistic data | ||
structure. | ||
|
||
## Bloom Filters | ||
|
||
A Bloom filter is a space-efficient probabilistic data structure that is | ||
used to test whether an element is a member of a set. It allows for queries | ||
to return: "possibly in set" or "definitely not in set". Elements can be | ||
added to the set, but not removed; the more elements that are added to the | ||
set, the larger the probability of false positives. It has been shown that | ||
fewer than 10 bits per element are required for a 1% false positive | ||
probability, independent of the size or number of elements in the set. | ||
|
||
The provided implementation allows you to create a Bloom filter specifying | ||
the approximate number of items expected to inserted and an optional false | ||
positive probability. It also allows you to approximate the total number of | ||
items in the filter. | ||
|
||
## Enhanced Double Hashing | ||
|
||
Enhanced double hashing is used to set bit positions within a bit vector. | ||
The choice for double hashing was shown to be effective without any loss in | ||
the asymptotic false positive probability, leading to less computation and | ||
potentially less need for randomness in practice by Adam Kirsch and | ||
Michael Mitzenmacher in [Less Hashing, Same Performance: Building a Better Bloom Filter](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.152.579&rep=rep1&type=pdf). | ||
|
||
The enhanced double hash takes the form of the following formula: | ||
|
||
g<sub>i</sub>(x) = (H<sub>1</sub>(x) + iH<sub>2</sub>(x) + f(i)) mod m, where | ||
|
||
H<sub>1</sub> | ||
is Murmur3 128-bit, H<sub>2</sub> is xxHash 64-bit, and f(i) = i<sup>3</sup> | ||
|
||
## Usage | ||
|
||
Add the `rsbloom` dependency to your `Cargo.toml`: | ||
|
||
```toml | ||
[dependencies] | ||
rsbloom = "0.1.0" | ||
``` | ||
|
||
## Example | ||
|
||
```rust | ||
use rsbloom::BloomFilter; | ||
|
||
fn main() { | ||
let approx_items = 100; | ||
let mut bf = BloomFilter::new(approx_items); | ||
|
||
bf.set(&"foo"); | ||
bf.set(&"bar"); | ||
|
||
bf.has(&"foo"); // true | ||
bf.has(&"bar"); // true | ||
bf.has(&"baz"); // false | ||
|
||
bf.num_items_approx(); // 2 | ||
} | ||
``` | ||
|
||
## Tests | ||
|
||
```shell | ||
make test | ||
``` | ||
|
||
## Benchmarks | ||
|
||
```shell | ||
make bench | ||
``` |