Fast, compact, and production-ready XOR filters for .NET. Perform O(1) approximate membership tests with tiny memory footprints and extremely low false-positive rates.
• Targets: .NET 8.0
XORFilter.Net is a .NET implementation of XOR filters, a family of probabilistic data structures for set membership. Compared to Bloom filters, XOR filters provide faster lookups and often better space efficiency while keeping false positives extremely low. See the original paper: "Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters" (arXiv:1912.08258).
- O(1), branch-light lookups with no false negatives
- Very small: about 1.23 × n slots of L-bit fingerprints
- Choice of fingerprint width: 8, 16, or 32 bits to tune false-positive rate
- Deterministic builds via optional seed
- Thread-safe lookups after filter construction
Package Manager:
PM> NuGet\Install-Package XORFilterDotNet -Version 1.0.8
.NET CLI:
dotnet add package XORFilterDotNet --version 1.0.8
Build a filter from a list of strings (converted to bytes) and query it:
using System.Text;
using XORFilter.Net;
var maliciousUrls = new List<string>
{
"phishing.example",
"malware.example",
"fraud.example",
"credential-theft.example",
"drive-by-download.example",
"suspicious-site.example"
};
var encodedValues = maliciousUrls.Select(Encoding.UTF8.GetBytes).ToArray();
// Choose the fingerprint width that fits your needs (8, 16, or 32 bits)
var filter = XorFilter32.BuildFrom(encodedValues);
bool malicious = filter.IsMember(Encoding.UTF8.GetBytes("phishing.example")); // returns true
bool shouldBeClean = filter.IsMember(Encoding.UTF8.GetBytes("example.com")); // likely returns false
False-positive probabilities (percent) and memory formulas:
Implementation | Underlying Type | False-positive rate (ε) | Memory |
---|---|---|---|
XorFilter8 | byte | 0.390625% | ≈ 1.23 × n × 8 bits |
XorFilter16 | ushort | 0.0015258789% | ≈ 1.23 × n × 16 bits |
XorFilter32 | uint | 2.3283064e-8% | ≈ 1.23 × n × 32 bits |
Tip: Start with 16-bit for a good balance. Use 8-bit for tiny memory or 32-bit for near-zero false positives.
- Build:
XorFilter8.BuildFrom(Span<byte[]> values, int? seed = null)
(also 16 and 32 variants) - Query:
bool IsMember(byte[] value)
Notes:
- Input is deduplicated internally; duplicates don't increase size.
- Lookups are thread-safe after the filter is built.
- Construction may retry with different hashes; the library handles this automatically and may slightly grow the table on failure.
Values are mapped to three disjoint partitions using MurmurHash3 and a CRC32-based fingerprint. A peeling process identifies values that are uniquely mapped, records an order, and fills table slots in reverse so that the XOR of the three slots equals the value's fingerprint. Membership testing XORs the three slots and compares with the fingerprint.
- Faster lookups and typically better space usage at similar error rates
- Stores L-bit fingerprints instead of bits, enabling strong accuracy with modest memory
- Still probabilistic: false positives possible, no false negatives
- Static sets: filters are immutable; add/remove requires rebuilding.
- Operates on
byte[]
. Convert from strings withEncoding.UTF8.GetBytes
or from structs via serialization. - Minimize allocations in hot paths by reusing
byte[]
buffers when possible. - For reproducibility across runs, provide a fixed
seed
.
- Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters — https://arxiv.org/pdf/1912.08258.pdf
- Stanford CS166 slides (peeling and construction) — https://web.stanford.edu/class/archive/cs/cs166/cs166.1216/lectures/13/Slides13.pdf#page=57
Issues and PRs are welcome. If you have a use case or find an edge case, please open an issue.