1BRC in Rust
Uses mmap instead of reading the file into memory. This lets the OS handle paging and reduces memory copies.
The file is mapped once and shared across threads via Arc.
Splits work across all available CPU cores. Each thread gets a contiguous chunk of the file to process independently. The main thread figures out chunk boundaries by scanning for newlines, ensuring we never split a record in the middle.
Each thread maintains its own hash map during processing to avoid locks / contention. Results are merged in the main thread after all workers finish.
Each thread only touches each byte in their allocated segment once. Forward striding in this manner is cache friendly.
Every loop's body should avoid branches to eliminate the penalties of branch prediction misses.
This is possible since the format is known, assumed to be adhered to: (?<city>.{0,28});(?<temperature>-?\d?\d.\d)
Temperatures are read from bytes as i32 (- -> negate value, . -> skip).
Avoids floating-point parsing entirely during the hot loop. The final division by 10.0 happens only
once per city after all aggregation is done.
Instead of using city names as keys (expensive string comparisons), we hash each city name once with FxHash (faster than the default hasher which is designed to be cryptographically strong - unnecessary for this challenge) and use the u64 hash as the key. The actual city names are stored separately and only looked up at the end for printing.
City names are stored as fixed-size byte arrays inline in HashMap entries, avoiding separate heap allocations per string. Temperature values are stored as primitive tuples. Only allocate when encountering a new city.
Tip
Great resource: The Rust Performance Book
RUSTFLAGS="-C target-cpu=native" cargo run --package brc --bin brc --release| Hardware | Dataset | Rows | File Size | Threads |
|---|---|---|---|---|
| M1 Max (2021), 32GB RAM | 1BRC | 1B | ~14 GB | 10 |
| Page cache | Time |
|---|---|
| Cold | 9475ms |
| Hot | 2942ms |