nntp-history is a Go module designed for managing and storing NNTP (Network News Transfer Protocol) history records efficiently with support for both MySQL RocksDB and SQLite3.
It provides a scalable way to store and retrieve historical message records with duplicate detection and high-throughput capabilities.
This module is suitable for building applications related to Usenet news servers or any system that requires managing message history efficiently.
- Dual Backend Support: MySQL RocksDB and SQLite3 with RocksDB-style optimizations
- High Performance: Optimized for millions of records with minimal latency
- Duplicate Detection: Efficient hash-based duplicate checking with L1 cache
- Connection Pooling: Efficient database connection management
- Concurrent Processing: Multi-threaded design for high throughput
- Memory Mapped I/O: SQLite3 backend uses memory mapping for performance
go get github.com/go-while/nntp-history
- File-based: Simple deployment and backup
- RocksDB Optimizations: WAL mode, memory mapping, large caches
- 4096 Table Sharding: Optimal distribution using 3-character hex prefixes
- Always Available: No build tags required
- Distributed: Suitable for cluster environments
- High Concurrency: Supports more concurrent connections
- Complex Setup: Requires MySQL with RocksDB storage engine
import "github.com/go-while/nntp-history"
his := &history.HISTORY{DIR: "/path/to/history"}
err := his.InitializeDatabase(false) // false = SQLite3
if err != nil {
log.Fatal(err)
}
his := &history.HISTORY{DIR: "/path/to/history"}
err := his.InitializeDatabase(true) // true = MySQL
if err != nil {
log.Fatal(err)
}
The nntp-history
module supports multiple SQLite3 sharding strategies to optimize for different workloads. Detailed benchmarks with 1 million hash insertions show:
Sharding Mode | Insertion Rate (1M) | Go Heap RAM (Post GC) | SQLite Cache (Est. Max Potential) | Initialization Time | Key Characteristics |
---|---|---|---|---|---|
Mode 0 (1 DB, 4096 tables) | ~333K/sec | ~0-1 MB | ~400 MB (Shared) | 8.55s | Simplicity, best for read-heavy (large shared cache) |
Mode 2 (16 DBs, 256 tables) | ~332K/sec | ~0-1 MB | ~112 MB (Adaptive) | 2.42s | Recommended Default: Fast init, balanced R/W, good concurrency |
Mode 3 (64 DBs, 64 tables) | ~322K/sec | ~0-1 MB | ~192 MB (Adaptive) | 3.05s | Write-focused, higher concurrency needs |
Key Takeaways from 1M Hash Tests:
- High Write Performance: All tested modes exceed 320,000 inserts/second.
- Low Go Application RAM: Go's heap usage is minimal (~1MB after GC) due to SQLite's memory management.
- Adaptive SQLite Cache: SQLite's page cache is managed efficiently. Multi-DB modes use adaptive sizing (e.g., Mode 2 estimates up to 112MB total cache if fully utilized, Mode 3 up to 192MB). This is OS-managed memory, not Go heap.
- Fast Initialization for Multi-DB: Mode 2 (16 DBs) initializes significantly faster than Mode 0 (single DB with many tables).
Recommendation:
- Mode 2 (16 DBs, 256 tables each) is recommended for most use cases, offering a strong balance of fast startup, high throughput, and efficient memory utilization.
- Use Mode 0 if simplicity is paramount or for predominantly read-heavy workloads that benefit from a single, large shared cache.
- Consider Mode 3 or other multi-DB options for very specific high-concurrency write scenarios.
Refer to examples/README.md
and sqlite-sharded-integration-log.md
for more detailed analysis.
- Test Suite: nntp-history_test - Comprehensive testing and benchmarking
- Examples: See
examples/
directory for usage examples
For comprehensive testing and benchmarking, use the dedicated test repository: nntp-history_test
# Clone the test repository
git clone https://github.com/go-while/nntp-history_test
cd nntp-history_test
# Build and run basic test
./build.sh
./nntp-history-test -todo=100000
# Performance test with SQLite3
./nntp-history-test -backend=sqlite3 -todo=1000000
# High-volume duplicate detection test
./nntp-history-test -todo=1000000 -DBG_BS_LOG=true
- Performance varies based on hardware, database configuration, and workload patterns
- Both SQLite3 and MySQL RocksDB backends are optimized for high-throughput duplicate detection
- Run your own benchmarks with the test repository for accurate measurements on your system
- See the nntp-history_test repository for comprehensive testing tools
Based on the actual code structure, here are precise storage calculations for planning:
Each record in history.dat has a fixed format:
{sha256hash}\ttimestamp~expires~date\tstoragetoken\n
Per record breakdown:
{
+}
= 2 bytes (hash wrapper)- SHA256 hash = 64 bytes
\t
= 1 byte (tab)- Arrival timestamp = 10 bytes (zero-padded)
~
= 1 byte- Expires = 10 bytes (zero-padded or
----------
) ~
= 1 byte- Date = 10 bytes (zero-padded)
\t
= 1 byte (tab)- StorageToken = 1 byte (typically "F")
\n
= 1 byte (newline)
Total per record: 102 bytes
Table Structure:
- 4096 tables (s000 to sfff) using 3-character hex prefixes
- Each table stores: 7-char key + comma-separated offsets
- Distribution: ~256 records per table (1M Γ· 4096)
Storage Components:
- Keys storage: ~7 bytes per record
- Offsets storage: ~8 bytes per record (low collision rate)
- SQLite overhead: ~12 bytes per record
- Table/index overhead: ~5 MB total
Records | history.dat | SQLite3 DB | Total Storage |
---|---|---|---|
1M | 102 MB | 32 MB | 134 MB |
10M | 1.02 GB | 320 MB | 1.34 GB |
100M | 10.2 GB | 3.2 GB | 13.4 GB |
- keylen=7: Uses first 10 chars (3 for table + 7 for key)
- Low Collision Rate: 10-char prefixes provide excellent distribution
- Efficient Storage: SQLite3 WITHOUT ROWID tables minimize overhead
- Linear Scalability: Storage grows predictably with record count
# Simple build (SQLite3 included by default)
go build
# Using Makefile
make build
# Get dependencies
make deps
- SQLite3 Backend: See SQLITE3_README.md for detailed SQLite3 documentation
- API Reference: See inline documentation in source files
The BootHistory
function in this Go code is responsible for initializing and booting a history management system.
It provides essential configuration options and prepares the system for historical data storage and retrieval.
To use the BootHistory
function, follow these steps:
-
Call the
BootHistory
function with the desired configuration options. -
The history management system will be initialized and ready for use.
-
history.History.WriterChan
is a Go channel used for sending and processing historical data entries. -
It is primarily responsible for writing data to a historical data storage system, using a HashDB (SQLite3/MySQL) to avoid duplicate entries.
-
To send data for writing, you create a
HistoryObject
and send it through the channel. -
If the
ResponseChan
channel is provided, it receives one of the following (int) values:
/*
0: Indicates "not a duplicate."
1: Indicates "duplicate."
2: Indicates "retry later."
*/
-
The
history.History.IndexChan
is a Go channel with a dual purpose. -
Its primary function is to facilitate the detection of duplicate message-ID hashes within the history file.
-
When the offset is set to -1, the channel performs a check for duplicate hashes but does not add the hash to the database.
-
When the offset is set to a value greater than zero, the channel functions as a mechanism for adding these message-ID hashes to the database.
-
Beware: Adding message-ID hashes is then normally done via
history.History.WriterChan
if you want to write the history file too! -
If desired, one could only use the
IndexChan
and avoid writing the history file. Use the fullKeyLen
of hash and provide a uniq up-counter for their Offsets. -
If the
IndexRetChan
channel is provided, it receives one of the following (int) values:
/*
0: Indicates "not a duplicate."
1: Indicates "duplicate."
2: Indicates "retry later."
*/
- The standard key algorithm used is
HashShort
(-keyalgo=11).
To improve processing speed and optimize data storage, we organize data using SQLite3 table sharding based on hash prefixes:
-
We create 4096 separate tables (s000 to sfff), each corresponding to a 3-character hexadecimal prefix.
-
This provides optimal distribution and parallel access for high-performance operations.
The hash distribution uses a sophisticated sharding approach:
- The first 3 characters of the hash determine the table (s000-sfff)
- The next 7 characters (or
KeyLen
setting) are used as the key within that table - The recommended
KeyLen
is 7. The minimumKeyLen
is 1. The maximumKeyLen
is the length of the hash -3. - Reasonable values for
KeyLen
range from 4 to 7. Use higher values if you expect more than 100M messages. - Choose wisely. You can not change
KeyLen
after database creation.
*** These are outdated benchmarks from the previous BoltDB implementation ***
*** SQLite3 backend provides better performance and simpler management ***
# With SQLite3 sharding (4096 tables), collision rates are significantly lower:
# - keylen=7: Optimal balance of performance and storage efficiency
# - keylen=4-6: Higher collision rates, more multi-offset storage
# - keylen=8+: Near-zero collisions, maximum storage usage
# Run your own benchmarks with nntp-history_test for current performance metrics
Let's illustrate this approach with an example:
Suppose you have a Message-ID hash of "1a2b3c4d5e6f0...":
- We use the first 3 characters "1a2" to select the SQLite3 table "s1a2"
- The next 7 characters "b3c4d5e" (based on
KeyLen=7
) are used as the key within that table - The full hash is stored in history.dat file and database holds the offsets as comma-separated for that key
By following this approach, you can efficiently organize and retrieve data based on Message-ID hashes while benefiting from the performance and storage optimizations provided by SQLite3 with RocksDB-style optimizations.
Feel free to customize the KeyLen
setting to meet your specific performance and storage requirements.
Smaller KeyLen
values save space but may result in more collisions and multi-offset storage, while larger KeyLen
values reduce collisions but consume more space.
SQLite3 backend uses significantly fewer file descriptors:
- Writer History.dat: 1
- SQLite3 Database: 1 (main database file)
- SQLite3 WAL: 1 (write-ahead log)
- Total: ~3-5 file descriptors (much more efficient than the previous 33 FDs)
# Monitor open file descriptors
ls -lha /proc/$(pidof nntp-history-test)/fd
# Count all open FDs
ls -l /proc/$(pidof nntp-history-test)/fd | wc -l
# Count open history.dat
ls -lha /proc/$(pidof nntp-history-test)/fd|grep history.dat$|wc -l
# Count open SQLite3 database files
ls -lha /proc/$(pidof nntp-history-test)/fd|grep "\.sqlite3"|wc -l
When you retrieve and examine the statistics of a SQLite3 database in Go, you can gather valuable information about the database's performance, resource usage, and structure. The statistics provide insights into how the database is operating and can be useful for optimizing your application. Here are some of the key metrics you can see from SQLite3 database statistics:
Database Size Metrics:
- Total database file size in bytes
- Number of tables and their individual sizes
- WAL (Write-Ahead Log) file size and checkpoint frequency
- Page count and page utilization
Performance Metrics:
- Cache hit/miss ratios for database pages
- Query execution times and optimization statistics
- Transaction throughput and commit rates
- Lock contention and wait times
Storage Efficiency:
- Table and index storage utilization
- Fragmentation levels and vacuum effectiveness
- Compression ratios (if enabled)
- Free space and page reuse statistics
Connection and Concurrency:
- Active connection count
- Concurrent reader/writer statistics
- Transaction isolation levels and conflicts
- Checkpoint frequency and WAL size management
SQLite3-Specific Features:
- Memory-mapped I/O effectiveness
- Journal mode efficiency (WAL vs. DELETE)
- PRAGMA settings impact on performance
- Query planner statistics and index usage
These statistics can help you monitor and optimize your SQLite3 database. For example, you can use them to identify performance bottlenecks, understand resource usage patterns, and assess the efficiency of your data organization. The SQLite3 backend in nntp-history is optimized with RocksDB-style settings including WAL mode, large caches, and memory mapping for maximum performance.
Contributions to this code are welcome.
If you have suggestions for improvements or find issues, please feel free to open an issue or submit a pull request.
This code is provided under the MIT License. See the LICENSE file for details.