Skip to content

edgedelta/s3-edgedelta-streamer

Repository files navigation

S3 to EdgeDelta Streamer

Edge Delta logo

High-performance pipeline that streams S3 log files to EdgeDelta via HTTP in near real time.

Highlights

  • Handles hundreds of thousands of gzipped files per day with sub-second lag
  • Pluggable log-format registry (Zscaler, Cisco Umbrella, AWS services, and custom patterns)
  • Optional Redis-backed state for safe horizontal scaling
  • First-class observability: OTLP metrics, health endpoints, and dashboards
  • Automated installer, systemd integration, and container images

Table of Contents

Quick Start

Prerequisites: EdgeDelta agent running, AWS credentials with s3:GetObject and s3:ListBucket, sudo access, and (optionally) Redis for distributed deployments.

git clone https://github.com/daniel-edgedelta/s3-edgedelta-streamer.git
cd s3-edgedelta-streamer
sudo ./install.sh

The installer validates prerequisites, prompts for configuration, encrypts credentials, and registers a systemd service that tracks the EdgeDelta agent lifecycle.

Verify the deployment:

sudo systemctl status s3-streamer
sudo journalctl -fu s3-streamer

Tip: Deploy your EdgeDelta pipeline before installing so ports 8080, 8081, and 4317 are available.

Deployment Options

Systemd (recommended)

The installation script places binaries under /opt/edgedelta/s3-streamer/, stores encrypted credentials in /etc/systemd/creds/s3-streamer/, and persists state in /var/lib/s3-streamer/state.json. Start/stop the service with systemctl or re-run install.sh to rotate credentials.

Docker
docker build -t s3-edgedelta-streamer .
docker run -p 8080:8080 \
  -e AWS_ACCESS_KEY_ID=your_key \
  -e AWS_SECRET_ACCESS_KEY=your_secret \
  -e AWS_REGION=us-east-1 \
  s3-edgedelta-streamer
Docker Compose
docker-compose up -d
docker-compose logs -f s3-edgedelta-streamer
docker-compose down

Configuration Basics

Category Minimal Settings Notes
S3 bucket, prefix, region Remove any s3:// prefix from the bucket name.
HTTP sender endpoints, batch_lines, batch_bytes, flush_interval, workers Maintain a ~1.5:1 ratio between S3 and HTTP workers.
Processing worker_count, queue_size, scan_interval, delay_window Increase delay_window to ensure files are complete before processing.
State file_path, save_interval Default persistence uses the local filesystem.
Redis (optional) host, port, password, database, key_prefix Required when multiple streamer instances share state.
OTLP metrics enabled, endpoint, service_name Streams telemetry to the EdgeDelta collector (4317/tcp).

Tip: Keep default_format: "auto" to enable automatic log-format detection. Custom recipes live in docs/log-formats.md.

Minimal snippet for the state block:

state:
  file_path: "/var/lib/s3-streamer/state.json"
  save_interval: 30s
  redis:
    enabled: false

Operations & Monitoring

Warning: When Redis is enabled the streamer falls back to file storage if the cache is unavailable. Monitor logs for Redis unavailable, falling back to file storage to catch infrastructure issues early.

Troubleshooting

Symptom Quick Fix Deeper Dive
Rising http_buffer_drops_total Increase http.buffer_size or lower processing.worker_count. Drops are normal during backlog catch-up; investigate only if they persist.
processing_lag_seconds > 60 Add HTTP endpoints or workers; confirm EdgeDelta capacity. Scaling tips in docs/performance.md.
S3 errors (InvalidBucketName, AccessDenied) Remove s3:// prefix, verify IAM permissions and region. Re-run installer to regenerate credentials if needed.
HTTP 4xx/5xx spikes Check EdgeDelta agent status and port availability. Restart the agent (systemctl restart edgedelta).
Redis fallback warnings Validate Redis availability with redis-cli ping. Run ./s3-edgedelta-streamer --migrate-state after Redis recovers.

Need to rewind state? Stop the service, edit /var/lib/s3-streamer/state.json, and restart. Delete the file to process everything from scratch.

Architecture & Scaling

S3 (gzipped JSONL)
  ↓
15 S3 workers → 10k line buffer → 10 HTTP workers
  ↓
EdgeDelta HTTP inputs (8080/8081) → EdgeDelta backend
  • HTTP streaming avoids temporary files and keeps latency low.
  • Workers load-balance across endpoints using round robin.
  • Redis-backed state unlocks multi-instance deployments.
  • Real-world performance data and tuning levers live in docs/performance.md.

Documentation Map

Support

  1. Inspect local logs: /opt/edgedelta/s3-streamer/logs/streamer.log
  2. Check the EdgeDelta agent logs: /var/log/edgedelta/edgedelta.log
  3. Review the troubleshooting table above
  4. Escalate to EdgeDelta support with recent logs, metrics, and config snapshots

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published