Convert Nostr events from JSON to Protocol Buffers at lightspeed
Proton Beam is a highly experimental (and will eventually be a high-performance) Rust tool for converting Nostr events from JSON format to Protocol Buffers (protobuf). It provides both a CLI tool for batch processing and a daemon for real-time relay monitoring.
- 🚀 High Performance: Process 100+ events/second with validated signatures
- 🔒 Full Validation: Verify event IDs (SHA-256) and Schnorr signatures
- 📦 Efficient Storage: Protobuf + gzip compression (~3x smaller than JSON, 65%+ space savings)
- 🗄️ Optimized SQLite Index: Fast event lookups and deduplication (~307K lookups/sec)
- Bulk insert mode: 2-3x faster for large-scale index rebuilds (500K+ events/sec)
- Optimized PRAGMAs for multi-billion event datasets
- 🔄 Real-time Processing: Connect to multiple Nostr relays simultaneously
- 🎯 Smart Deduplication: Events stored once across all relay sources
- 🔍 Advanced Filtering: Filter by event kind, author, or tags
- ⚡ Input Preprocessing: Ultra-fast regex-based filtering before JSON parsing
- 🌐 Auto-discovery: Automatically discover and connect to new relays
- 📊 Progress Tracking: Beautiful progress bars for batch operations
- 🔀 Parallel Processing: Multi-threaded conversion for maximum throughput
- ☁️ AWS S3 Support: Direct upload to S3 buckets (optional feature)
- 💾 Scalable: Tested with 1TB+ datasets on commodity hardware
# Clone the repository
git clone https://github.com/yourusername/proton-beam.git
cd proton-beam
# Build all binaries
cargo build --release
# Install binaries
cargo install --path proton-beam-cli
cargo install --path proton-beam-daemonConvert a .jsonl file:
proton-beam convert events.jsonlRead from stdin:
cat events.jsonl | proton-beam convert -Specify output directory:
proton-beam convert events.jsonl --output-dir ./pb_dataSkip validation for faster processing:
# Skip both signature and ID validation
proton-beam convert events.jsonl --validate-signatures=false --validate-event-ids=false
# Or skip just signatures
proton-beam convert events.jsonl --validate-signatures=falseDisable preprocessing filter (enabled by default):
proton-beam convert events.jsonl --no-filter-kindsParallel processing with multiple threads:
proton-beam convert events.jsonl --parallel 8Recover from a failed parallel conversion (merge existing temp files):
proton-beam merge ./pb_data --cleanupAdjust compression level:
proton-beam convert events.jsonl --compression-level 9Rebuild the event index from protobuf files:
# Rebuild index with optimized bulk insert mode (2-3x faster)
proton-beam index rebuild ./pb_data
# Custom index location
proton-beam index rebuild ./pb_data --index-path ./custom/index.dbUpload to S3 after conversion (requires --features s3):
# Build with S3 support
cargo build --release --features s3 -p proton-beam-cli
# Convert and upload
proton-beam convert events.jsonl --s3-output s3://my-bucket/output/For processing large datasets on AWS EC2 with automatic S3 upload:
# Set your configuration
export INPUT_URL="https://example.com/data.jsonl"
export S3_OUTPUT_BUCKET="my-bucket"
export KEY_NAME="my-ec2-keypair"
# Deploy via CloudFormation
./scripts/deploy-cloudformation.shComplete guides:
- Quick Start - Get started in 3 steps
- Complete Guide - Full documentation
- 1.2TB Dataset Guide - Specific configuration example
Start with default configuration:
proton-beam-daemon startUse custom config:
proton-beam-daemon start --config config.tomlRequest historical events:
proton-beam-daemon start --since 1697000000proton-beam/
├── proton-beam-core/ # Core library (protobuf + conversion)
├── proton-beam-cli/ # CLI tool
├── proton-beam-daemon/ # Relay monitoring daemon
├── docs/ # Documentation
│ ├── PROJECT_PLAN.md # Complete project plan
│ └── PROTOBUF_SCHEMA.md # Protobuf schema documentation
└── examples/ # Sample events and configs
- Project Status & Plan: Current status, progress, and complete roadmap
- Architecture: System architecture and design decisions
- Protobuf Schema: Detailed schema documentation
- Developer Guide: Development setup and workflows
- Benchmarking Guide: Performance benchmarks and optimization tips
- Preprocessing Guide: Input filtering and preprocessing options
- Documentation Index: Complete documentation navigation
- API Documentation: Run
cargo doc --open
Proton Beam includes comprehensive benchmarks covering all critical paths. Run them with:
# Using just (recommended)
just bench
# Or using the shell script
./scripts/run-benchmarks.sh --releaseSample Results (Apple M1 Pro):
- JSON → Protobuf: ~195k events/sec
- Protobuf → JSON: ~845k events/sec
- Basic validation: ~7M validations/sec
- Storage throughput: ~473 MB/sec write, ~810 MB/sec read
- End-to-end pipeline: ~155k events/sec
See BENCHMARKS_README.md for detailed information.
config.toml for daemon:
[daemon]
output_dir = "./nostr_events"
batch_size = 500
log_level = "info"
[relays]
urls = [
"wss://relay.damus.io",
"wss://nos.lol",
"wss://relay.primal.net",
"wss://relay.nostr.band",
"wss://relay.snort.social",
]
auto_discover = true
max_relays = 50
[filters]
kinds = [] # Empty = all kinds
authors = [] # Empty = all authors
[storage]
deduplicate = true
use_index = true- JSON Input: Accepts Nostr events in JSON format (from files, stdin, or relays)
- Validation: Verifies event ID (SHA-256) and Schnorr signature
- Conversion: Converts to efficient protobuf binary format
- Storage: Organizes events by date (
YYYY_MM_DD.pb) with length-delimited encoding - Indexing: Maintains SQLite index for fast deduplication and querying
- Throughput: 100+ events/second (with full validation)
- Storage: ~10-25% smaller than minified JSON
- Memory: < 100MB under normal load
- Validation: 100% accuracy using nostr-sdk
✅ Phase 1 Complete: Core library fully implemented and tested (62/62 tests passing) ✅ Phase 1.5 Complete: Enhanced API with builder, Display, Serde, FromIterator ✅ Phase 2 Complete: CLI tool with progress bars, date-based storage (18/18 tests passing) ✅ CI/CD: Automated testing, linting, formatting, and benchmarks 🚧 Next Phase: SQLite Index & Deduplication
See PROJECT_STATUS.md for detailed progress.
- Phase 1: Core library & protobuf schema ✅
- Phase 1.5: Enhanced API features ✅
- Phase 2: CLI tool ✅
- Phase 3: SQLite index & deduplication ⏳
- Phase 4: Relay daemon (core)
- Phase 5: Relay discovery & advanced features
- Phase 6: Testing, documentation & polish
- Event Archival: Efficiently archive Nostr events for long-term storage
- Data Analysis: Process large datasets of Nostr events
- Relay Backups: Create compressed backups of relay data
- Research: Analyze Nostr protocol usage and patterns
- Integration: Use as a library in other Rust projects
// Named ProtoEvent to avoid conflicts with nostr-sdk::Event
message ProtoEvent {
string id = 1; // Event ID (hex)
string pubkey = 2; // Public key (hex)
int64 created_at = 3; // Unix timestamp
int32 kind = 4; // Event kind
repeated Tag tags = 5; // Tags
string content = 6; // Content
string sig = 7; // Signature (hex)
}
message Tag {
repeated string values = 1;
}Events are stored in date-organized files using length-delimited protobuf:
./nostr_events/
├── 2025_10_13.pb # All events from Oct 13
├── 2025_10_14.pb
├── proton-beam.log # Error and warning logs
└── index.db # SQLite index
Contributions are welcome! Please read CONTRIBUTING.md before submitting PRs.
# Clone the repo
git clone https://github.com/yourusername/proton-beam.git
cd proton-beam
# Run tests
cargo test --all
# Run tests with output
cargo test --all -- --nocapture
# Format code
cargo fmt --all
# Lint
cargo clippy --all-targets --all-features
# Or use just commands (recommended)
just test # Run tests
just fmt # Check formatting
just lint # Run clippy
just precommit # Run all pre-commit checks (format, lint, tests, MSRV)All code is automatically checked on pull requests:
- ✅ Format validation (
rustfmt) - ✅ Lint checks (
clippy) - ✅ Documentation builds
- ✅ Tests on Linux, macOS, and Windows
- ✅ MSRV compatibility (Rust 1.90+)
- ✅ Security audit
- 📊 Performance benchmarks
See CI Workflows for details.
- nostr-sdk: Nostr protocol implementation
- prost: Protocol Buffers for Rust
- tokio: Async runtime
- clap: CLI argument parsing
- rusqlite: SQLite bindings
MIT License - see LICENSE for details
- Nostr Protocol - The protocol this tool supports
- nostr-sdk - Excellent Rust implementation
- Protocol Buffers - Efficient serialization format
- nostr-tools - JavaScript Nostr library
- nostr-rs-relay - Rust Nostr relay
- nostcat - Nostr CLI tool
Built with ⚡ and 🦀 by the Nostr community