High-performance pipeline that streams S3 log files to EdgeDelta via HTTP in near real time.
Highlights
- Handles hundreds of thousands of gzipped files per day with sub-second lag
- Pluggable log-format registry (Zscaler, Cisco Umbrella, AWS services, and custom patterns)
- Optional Redis-backed state for safe horizontal scaling
- First-class observability: OTLP metrics, health endpoints, and dashboards
- Automated installer, systemd integration, and container images
- S3 to EdgeDelta Streamer
- Table of Contents
- Quick Start
- Deployment Options
- Configuration Basics
- Operations & Monitoring
- Troubleshooting
- Architecture & Scaling
- Documentation Map
- Support
Prerequisites: EdgeDelta agent running, AWS credentials with
s3:GetObjectands3:ListBucket, sudo access, and (optionally) Redis for distributed deployments.
git clone https://github.com/daniel-edgedelta/s3-edgedelta-streamer.git
cd s3-edgedelta-streamer
sudo ./install.shThe installer validates prerequisites, prompts for configuration, encrypts credentials, and registers a systemd service that tracks the EdgeDelta agent lifecycle.
Verify the deployment:
sudo systemctl status s3-streamer
sudo journalctl -fu s3-streamerTip: Deploy your EdgeDelta pipeline before installing so ports
8080,8081, and4317are available.
Systemd (recommended)
The installation script places binaries under /opt/edgedelta/s3-streamer/, stores encrypted credentials in /etc/systemd/creds/s3-streamer/, and persists state in /var/lib/s3-streamer/state.json. Start/stop the service with systemctl or re-run install.sh to rotate credentials.
Docker
docker build -t s3-edgedelta-streamer .
docker run -p 8080:8080 \
-e AWS_ACCESS_KEY_ID=your_key \
-e AWS_SECRET_ACCESS_KEY=your_secret \
-e AWS_REGION=us-east-1 \
s3-edgedelta-streamerDocker Compose
docker-compose up -d
docker-compose logs -f s3-edgedelta-streamer
docker-compose down| Category | Minimal Settings | Notes |
|---|---|---|
| S3 | bucket, prefix, region |
Remove any s3:// prefix from the bucket name. |
| HTTP sender | endpoints, batch_lines, batch_bytes, flush_interval, workers |
Maintain a ~1.5:1 ratio between S3 and HTTP workers. |
| Processing | worker_count, queue_size, scan_interval, delay_window |
Increase delay_window to ensure files are complete before processing. |
| State | file_path, save_interval |
Default persistence uses the local filesystem. |
| Redis (optional) | host, port, password, database, key_prefix |
Required when multiple streamer instances share state. |
| OTLP metrics | enabled, endpoint, service_name |
Streams telemetry to the EdgeDelta collector (4317/tcp). |
Tip: Keep
default_format: "auto"to enable automatic log-format detection. Custom recipes live indocs/log-formats.md.
Minimal snippet for the state block:
state:
file_path: "/var/lib/s3-streamer/state.json"
save_interval: 30s
redis:
enabled: false- Day-to-day commands, health endpoints, and migration flows:
docs/operations.md - Full metric catalog, dashboard ideas, and alerting strategies:
docs/monitoring.md
Warning: When Redis is enabled the streamer falls back to file storage if the cache is unavailable. Monitor logs for
Redis unavailable, falling back to file storageto catch infrastructure issues early.
| Symptom | Quick Fix | Deeper Dive |
|---|---|---|
Rising http_buffer_drops_total |
Increase http.buffer_size or lower processing.worker_count. |
Drops are normal during backlog catch-up; investigate only if they persist. |
processing_lag_seconds > 60 |
Add HTTP endpoints or workers; confirm EdgeDelta capacity. | Scaling tips in docs/performance.md. |
S3 errors (InvalidBucketName, AccessDenied) |
Remove s3:// prefix, verify IAM permissions and region. |
Re-run installer to regenerate credentials if needed. |
| HTTP 4xx/5xx spikes | Check EdgeDelta agent status and port availability. | Restart the agent (systemctl restart edgedelta). |
| Redis fallback warnings | Validate Redis availability with redis-cli ping. |
Run ./s3-edgedelta-streamer --migrate-state after Redis recovers. |
Need to rewind state? Stop the service, edit
/var/lib/s3-streamer/state.json, and restart. Delete the file to process everything from scratch.
S3 (gzipped JSONL)
↓
15 S3 workers → 10k line buffer → 10 HTTP workers
↓
EdgeDelta HTTP inputs (8080/8081) → EdgeDelta backend
- HTTP streaming avoids temporary files and keeps latency low.
- Workers load-balance across endpoints using round robin.
- Redis-backed state unlocks multi-instance deployments.
- Real-world performance data and tuning levers live in
docs/performance.md.
docs/log-formats.md– Complete log-format reference and regex tipsdocs/operations.md– Systemd, Docker, health endpoints, migrationsdocs/monitoring.md– Metrics catalog, dashboards, alert playbooksdocs/performance.md– Throughput snapshots, scaling heuristics, data layoutdashboard-header.md– Ready-to-use EdgeDelta dashboard header copy
- Inspect local logs:
/opt/edgedelta/s3-streamer/logs/streamer.log - Check the EdgeDelta agent logs:
/var/log/edgedelta/edgedelta.log - Review the troubleshooting table above
- Escalate to EdgeDelta support with recent logs, metrics, and config snapshots