Open Elastic Hash

A Python implementation of advanced open addressing hash table algorithms from the paper "Optimal Bounds for Open Addressing Without Reordering" by Martín Farach-Colton, Andrew Krapivin, and William Kuszmaul.

Note: This package is published on PyPI as open-elastic-hash but the internal module name remains elastic_hash.

Features

This library provides two cutting-edge hash table implementations:

ElasticHashTable: Achieves O(1) amortized expected probe complexity and O(log 1/δ) worst-case expected probe complexity without reordering elements.
FunnelHashTable: A greedy approach that achieves O(log² 1/δ) worst-case expected probe complexity, disproving the longstanding Yao's conjecture that O(1/δ) was optimal for greedy approaches.

Where δ is the fraction of empty slots in the table (e.g., if the table is 90% full, δ = 0.1).

Installation

pip install open-elastic-hash

Or install from source:

git clone https://github.com/jorgik1/elastic-hash.git
cd elastic-hash
pip install -e .

Usage

ElasticHashTable

from elastic_hash import ElasticHashTable  # Note: Package is 'open-elastic-hash' but module is 'elastic_hash'

# Create a new table with capacity 100
table = ElasticHashTable(100)

# Insert some values
table.insert("name", "John Doe")
table.insert("email", "john@example.com")
table.insert("age", 30)

# Retrieve values
print(table.get("name"))  # "John Doe"
print(table.get("unknown"))  # None

# Check if key exists
print("name" in table)  # True

# Remove a key
table.remove("email")  # True

# Get size
print(len(table))  # 2

# Iterate over all key-value pairs
for key, value in table:
    print(f"{key}: {value}")

FunnelHashTable

from elastic_hash import FunnelHashTable  # Note: Package is 'open-elastic-hash' but module is 'elastic_hash'

# Create a new table with capacity 100
table = FunnelHashTable(100)

# The API is the same as ElasticHashTable
table.insert("key", "value")
value = table.get("key")
exists = "key" in table
success = table.remove("key")

Advantages Over Built-in Dict

Bounded Operations: Our hash tables provide strong theoretical guarantees on worst-case operation time, which can be critical for real-time applications.
Consistent Performance: When operating at high load factors (e.g., 90% full), our hash tables maintain better worst-case performance than traditional approaches.
Memory Efficiency: Our implementations work well even when very full, allowing you to use memory more efficiently.
No Reordering: Unlike some advanced hash tables, our implementations never move elements after insertion, which can be important for certain applications.

Benchmarks

We've benchmarked both hash table implementations against Python's built-in dict. Here are some key findings from our tests:

Implementation	Load Factor	Insert (ops/sec)	Lookup Hit (ops/sec)	Lookup Miss (ops/sec)	Delete (ops/sec)
ElasticHashTable	0.5	~6,000	~740,000	~76,000	~620,000
ElasticHashTable	0.9	~6,500	~760,000	~78,000	~570,000
FunnelHashTable	0.5	~26,000	~550,000	~28,000	~160,000
FunnelHashTable	0.9	~27,000	~570,000	~28,000	~120,000

Key observations:

FunnelHashTable is ~4x faster for insertions than ElasticHashTable
ElasticHashTable has better lookup miss performance (when key doesn't exist)
Both implementations maintain consistent performance even at high load factors (90%)
The performance characteristics match the theoretical guarantees from the paper

The examples/benchmark.py script lets you run your own benchmarks across different capacities and load factors:

python examples/benchmark.py

# For visualization support, install matplotlib:
pip install matplotlib

Implementation Notes

ElasticHashTable

Divides the array into subarrays of geometrically decreasing sizes
Uses a two-dimensional probe sequence for each key
Implements batch-based insertion to maintain balanced fill levels
Achieves O(1) amortized expected probe complexity through careful distribution of work
Excellent negative lookup performance (when key doesn't exist)
Hash functions are carefully designed to avoid clustering

FunnelHashTable

Divides the array into multiple levels with buckets of fixed size
Uses a greedy approach where elements cascade down through levels
Implements a special overflow area using the power-of-two-choices method
Achieves O(log² 1/δ) worst-case expected probe complexity
Superior insertion performance
Can handle high load factors with minimal performance degradation

When to Use

Use ElasticHashTable when:

Your workload is lookup-heavy, especially with many lookups for keys that don't exist
You need optimal amortized performance
You need better delete operation performance

Use FunnelHashTable when:

Your workload is insert-heavy
You want to maximize insertion throughput
You need a greedy approach with better worst-case guarantees than traditional hash tables

Both implementations are ideal for:

Real-time systems requiring predictable performance
High load factor scenarios (>80% full)
Memory-constrained environments
Applications where elements shouldn't be reordered after insertion

Running Tests

# Run all tests
python run_all_tests.py

# Run a specific test
python -m unittest tests.test_hash_tables.TestElasticHashTable.test_high_load_factor

Our test suite verifies that both implementations:

Handle basic operations (insert, get, remove) correctly
Work at high load factors (up to 80%)
Handle collisions properly
Throw appropriate errors when the table is full

Credits

Based on the paper "Optimal Bounds for Open Addressing Without Reordering" by Martín Farach-Colton, Andrew Krapivin, and William Kuszmaul.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
elastic_hash		elastic_hash
examples		examples
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PUBLISHING.md		PUBLISHING.md
README.md		README.md
pyproject.toml		pyproject.toml
run_all_tests.py		run_all_tests.py
run_all_tests.sh		run_all_tests.sh
run_format.py		run_format.py
run_format.sh		run_format.sh
run_specific_tests.py		run_specific_tests.py
run_table_full_test.py		run_table_full_test.py
run_tests.py		run_tests.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open Elastic Hash

Features

Installation

Usage

ElasticHashTable

FunnelHashTable

Advantages Over Built-in Dict

Benchmarks

Implementation Notes

ElasticHashTable

FunnelHashTable

When to Use

Running Tests

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jorgik1/elastic-hash

Folders and files

Latest commit

History

Repository files navigation

Open Elastic Hash

Features

Installation

Usage

ElasticHashTable

FunnelHashTable

Advantages Over Built-in Dict

Benchmarks

Implementation Notes

ElasticHashTable

FunnelHashTable

When to Use

Running Tests

Credits

License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages