Skip to content

Commit

Permalink
add readme and makefile
Browse files Browse the repository at this point in the history
  • Loading branch information
Aleksandr Bezobchuk committed Dec 30, 2018
1 parent b69335e commit afc62e9
Show file tree
Hide file tree
Showing 2 changed files with 87 additions and 0 deletions.
12 changes: 12 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.PHONY: test bench build

all: test bench build

tests:
@cargo test

bench:
@cargo bench

build:
@cargo build
75 changes: 75 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# rsbloom

A simple implementation of a Bloom filter, a space-efficient probabilistic data
structure.

## Bloom Filters

A Bloom filter is a space-efficient probabilistic data structure that is
used to test whether an element is a member of a set. It allows for queries
to return: "possibly in set" or "definitely not in set". Elements can be
added to the set, but not removed; the more elements that are added to the
set, the larger the probability of false positives. It has been shown that
fewer than 10 bits per element are required for a 1% false positive
probability, independent of the size or number of elements in the set.

The provided implementation allows you to create a Bloom filter specifying
the approximate number of items expected to inserted and an optional false
positive probability. It also allows you to approximate the total number of
items in the filter.

## Enhanced Double Hashing

Enhanced double hashing is used to set bit positions within a bit vector.
The choice for double hashing was shown to be effective without any loss in
the asymptotic false positive probability, leading to less computation and
potentially less need for randomness in practice by Adam Kirsch and
Michael Mitzenmacher in [Less Hashing, Same Performance: Building a Better Bloom Filter](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.152.579&rep=rep1&type=pdf).

The enhanced double hash takes the form of the following formula:

g<sub>i</sub>(x) = (H<sub>1</sub>(x) + iH<sub>2</sub>(x) + f(i)) mod m, where

H<sub>1</sub>
is Murmur3 128-bit, H<sub>2</sub> is xxHash 64-bit, and f(i) = i<sup>3</sup>

## Usage

Add the `rsbloom` dependency to your `Cargo.toml`:

```toml
[dependencies]
rsbloom = "0.1.0"
```

## Example

```rust
use rsbloom::BloomFilter;

fn main() {
let approx_items = 100;
let mut bf = BloomFilter::new(approx_items);

bf.set(&"foo");
bf.set(&"bar");

bf.has(&"foo"); // true
bf.has(&"bar"); // true
bf.has(&"baz"); // false

bf.num_items_approx(); // 2
}
```

## Tests

```shell
make test
```

## Benchmarks

```shell
make bench
```

0 comments on commit afc62e9

Please sign in to comment.