TART Backend

TART (Testing, Analytics and Research Telemetry) is a production-grade, high-performance telemetry backend for JAM blockchain networks. Built in Rust, it handles up to 1024 concurrent node connections, processing 10,000+ events per second with sub-millisecond latency.

Features

Core Capabilities

✅ Real-time telemetry collection from JAM blockchain nodes via binary TCP protocol (JIP-3 compliant)
✅ High-performance event processing with PostgreSQL backend
✅ RESTful API for historical data access and analytics
✅ WebSocket streaming for real-time event distribution
✅ Terminal dashboard (tart-dash) for operator monitoring
✅ Prometheus metrics for observability
✅ Health checks with component-level status

Performance Characteristics

Metric	Specification
Throughput	10,000+ events/second
Event Ingestion Latency	<1ms (p99)
Database Write Latency	<20ms (p99)
Memory per Connection	~0.5MB
WebSocket Broadcast Latency	<5ms
Supported Event Types	130 (JIP-3 full spec)

Production Features

🔒 Rate limiting: 100 events/sec per node (configurable)
🚀 Batch writing: Optimized database writes with configurable batching
💾 Database connection pooling: 200 connections for high concurrency
📊 Stats caching: 5-second TTL reduces database load by 95%
🔄 Circuit breakers: Automatic recovery from transient failures
🎯 Retry logic: Exponential backoff with jitter
📈 VecDeque optimization: O(1) event buffer operations
🗄️ Time-partitioned tables: Daily partitions for efficient archival

Quick Start

Using Docker (Recommended)

# Clone repository
git clone https://github.com/your-org/tart-backend.git
cd tart-backend

# Start backend + PostgreSQL
docker-compose up -d

# View logs
docker-compose logs -f tart-backend

# API endpoint at http://localhost:8080
# Telemetry endpoint at tcp://localhost:9000

From Source

# Install dependencies
# Requires: Rust 1.77+, PostgreSQL 16+

# Set database URL
export DATABASE_URL="postgres://user:password@localhost:5432/tart_telemetry"

# Build and run
cargo build --release
./target/release/tart-backend

Terminal Dashboard

# Run the TUI dashboard
cargo run --release --bin tart-dash

# Or with custom host/port
./target/release/tart-dash --host localhost --port 8080 --refresh-ms 1000

Installation

Prerequisites

Required:

Rust 1.77.0 or later
PostgreSQL 16+ (14+ supported)
Git

Optional:

Docker & Docker Compose (for containerized deployment)
Prometheus & Grafana (for monitoring)

Option 1: Docker Deployment (Production)

# 1. Clone repository
git clone https://github.com/your-org/tart-backend.git
cd tart-backend

# 2. Configure production credentials (IMPORTANT!)
export POSTGRES_PASSWORD=$(openssl rand -base64 32)
export DATABASE_URL="postgres://tart:${POSTGRES_PASSWORD}@postgres:5432/tart_telemetry"

# 3. Build and start
docker-compose build
docker-compose up -d

# 4. Verify health
curl http://localhost:8080/api/health

Option 2: Kubernetes Deployment

See deployment/kubernetes/README.md for Helm charts and manifests.

Option 3: Binary Installation

# 1. Build release binaries
cargo build --release

# 2. Install to system
sudo cp target/release/tart-backend /usr/local/bin/
sudo cp target/release/tart-dash /usr/local/bin/

# 3. Create systemd service (see deployment/systemd/tart-backend.service)
sudo cp deployment/systemd/tart-backend.service /etc/systemd/system/
sudo systemctl enable tart-backend
sudo systemctl start tart-backend

Configuration

Environment Variables

Variable	Description	Default	Production Example
`DATABASE_URL`	PostgreSQL connection string	Required	`postgres://tart:STRONG_PASSWORD@db:5432/tart_telemetry`
`TELEMETRY_BIND`	Telemetry server bind address	`0.0.0.0:9000`	`0.0.0.0:9000`
`API_BIND`	HTTP API server bind address	`0.0.0.0:8080`	`0.0.0.0:8080`
`RUST_LOG`	Logging configuration	`info`	`tart_backend=info,sqlx=warn`

PostgreSQL Configuration

For production with 1024 nodes, configure PostgreSQL:

# In postgresql.conf or docker-compose.yml
shared_buffers = 2GB
max_connections = 300
effective_cache_size = 8GB
work_mem = 16MB
maintenance_work_mem = 512MB
wal_buffers = 16MB
max_wal_size = 4GB

Linux System Tuning

For optimal performance:

# Network tuning
sudo sysctl -w net.core.somaxconn=2048
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=2048
sudo sysctl -w net.core.netdev_max_backlog=5000

# File descriptor limits
ulimit -n 65536

# Save permanently to /etc/sysctl.conf

API Reference

REST Endpoints

Health & Status

GET /api/health - Basic health check

Response:

{
  "status": "ok",
  "service": "tart-backend",
  "version": "0.1.0"
}

Status Codes:

200 OK - Service is healthy

GET /api/health/detailed - Detailed health with component status

Response:

{
  "status": "healthy",
  "timestamp": "2025-10-28T14:30:00Z",
  "components": {
    "database": {
      "status": "healthy",
      "latency_ms": 0.8,
      "message": "Connected to PostgreSQL"
    },
    "broadcaster": {
      "status": "healthy",
      "subscribers": 12,
      "message": "Broadcasting events"
    },
    "memory": {
      "status": "healthy",
      "usage_percent": 45.2
    }
  },
  "version": "0.1.0",
  "uptime_seconds": 86400
}

Status Codes:

200 OK - Service is healthy or degraded
503 Service Unavailable - Service is unhealthy

GET /api/stats - System statistics

Response:

{
  "total_blocks_authored": 15234,
  "best_block": 15234,
  "finalized_block": 15200
}

Notes:

Stats are cached with 5-second TTL for performance
Refreshed automatically on cache miss

GET /api/nodes - List all nodes

Response:

{
  "nodes": [
    {
      "node_id": "a1b2c3d4e5f6...",
      "peer_id": "a1b2c3d4e5f6...",
      "implementation_name": "polkajam",
      "implementation_version": "0.1.0",
      "is_connected": true,
      "event_count": 5823,
      "connected_at": "2025-10-28T14:00:00Z",
      "last_seen_at": "2025-10-28T14:30:00Z",
      "disconnected_at": null
    }
  ]
}

Notes:

Returns all known nodes (connected and disconnected)
Ordered by connection status, then last_seen_at

GET /api/nodes/:node_id - Get specific node details

Parameters:

node_id (path) - 64-character hexadecimal node identifier

Response:

{
  "node_id": "a1b2c3d4...",
  "peer_id": "a1b2c3d4...",
  "implementation_name": "polkajam",
  "implementation_version": "0.1.0",
  "node_info": {
    "params": { /* ProtocolParameters */ },
    "genesis": "0x...",
    "flags": 1
  },
  "is_connected": true,
  "event_count": 5823,
  "connected_at": "2025-10-28T14:00:00Z",
  "last_seen_at": "2025-10-28T14:30:00Z"
}

Status Codes:

200 OK - Node found
400 Bad Request - Invalid node_id format
404 Not Found - Node not found

GET /api/nodes/:node_id/events - Get events for specific node

Parameters:

node_id (path) - 64-character hexadecimal node identifier
limit (query, optional) - Max events to return (default: 50, max: 1000)

Response:

{
  "events": [
    {
      "id": 12345,
      "node_id": "a1b2c3d4...",
      "event_id": 100,
      "event_type": 11,
      "timestamp": "2025-10-28T14:30:00Z",
      "data": {
        "BestBlockChanged": {
          "slot": 1234,
          "hash": "0x..."
        }
      },
      "node_name": "polkajam",
      "node_version": "0.1.0"
    }
  ]
}

Notes:

Events are ordered by most recent first
Uses database-level filtering for optimal performance

GET /api/events - Get recent events across all nodes

Parameters:

limit (query, optional) - Max events to return (default: 50, max: 1000)

Response:

{
  "events": [
    {
      "id": 12345,
      "node_id": "a1b2c3d4...",
      "event_id": 100,
      "event_type": 11,
      "timestamp": "2025-10-28T14:30:00Z",
      "data": { /* Event-specific data */ },
      "node_name": "polkajam",
      "node_version": "0.1.0"
    }
  ]
}

WebSocket Endpoints

WS /api/ws - Real-time event streaming

Connect to receive live events as they arrive from nodes.

Client Messages:

// Subscribe to all events
{
  "type": "Subscribe",
  "filter": { "type": "All" }
}

// Subscribe to specific node
{
  "type": "Subscribe",
  "filter": {
    "type": "Node",
    "node_id": "a1b2c3d4..."
  }
}

// Get recent events
{
  "type": "GetRecentEvents",
  "limit": 100
}

// Ping
{
  "type": "Ping"
}

// Unsubscribe
{
  "type": "Unsubscribe"
}

Server Messages:

// Connection established
{
  "type": "connected",
  "data": {
    "message": "Connected to TART telemetry (1024-node scale)",
    "recent_events": 20,
    "total_nodes": 5,
    "broadcaster_stats": { /* ... */ }
  },
  "timestamp": "2025-10-28T14:30:00Z"
}

// New event
{
  "type": "event",
  "data": {
    "id": 12345,
    "node_id": "a1b2c3d4...",
    "event_type": 11,
    "latency_ms": 2,
    "event": {
      "BestBlockChanged": {
        "slot": 1234,
        "hash": "0x..."
      }
    }
  },
  "timestamp": "2025-10-28T14:30:00Z"
}

// Periodic stats update (every 5 seconds)
{
  "type": "stats",
  "data": {
    "database": { /* DB stats */ },
    "broadcaster": { /* Broadcaster stats */ },
    "connections": {
      "total": 5,
      "nodes": ["node1", "node2", ...]
    }
  },
  "timestamp": "2025-10-28T14:30:00Z"
}

// Pong response
{
  "type": "pong",
  "data": {
    "events_received": 1234,
    "uptime_ms": 5000
  },
  "timestamp": "2025-10-28T14:30:00Z"
}

Prometheus Metrics

GET /metrics - Prometheus metrics endpoint

Available Metrics:

Connection Metrics:

telemetry_active_connections          # Current connection count
telemetry_connections_total           # Total connections (lifetime)
telemetry_connections_rejected        # Rejected connections (limit reached)

Event Metrics:

telemetry_events_received             # Total events received
telemetry_events_dropped              # Events dropped (backpressure)
telemetry_events_rate_limited         # Events rate-limited
telemetry_buffer_pending              # Events in write queue
telemetry_batch_write_duration        # Batch write latency (histogram)

Broadcaster Metrics:

telemetry_broadcaster_subscribers     # Active WebSocket subscribers
telemetry_broadcaster_total_broadcast # Total broadcast operations

Database Metrics:

telemetry_database_pool_connections   # Active DB connections

Integrations

Node Integration (JAM Validator Nodes)

JAM nodes connect to TART via the --telemetry flag:

# Run JAM validator with telemetry
jam-validator \
  --chain dev \
  --telemetry tart-backend:9000 \
  --dev-validator 0

Protocol: Binary TCP (JIP-3 telemetry specification) Port: 9000 (configurable via TELEMETRY_BIND)

REST API Integration

Python Integration

import requests
import json

class TartClient:
    def __init__(self, base_url="http://localhost:8080"):
        self.base_url = base_url

    def get_health(self):
        """Check backend health"""
        response = requests.get(f"{self.base_url}/api/health")
        return response.json()

    def get_nodes(self):
        """Get all connected nodes"""
        response = requests.get(f"{self.base_url}/api/nodes")
        return response.json()["nodes"]

    def get_node_events(self, node_id, limit=100):
        """Get events for a specific node"""
        response = requests.get(
            f"{self.base_url}/api/nodes/{node_id}/events",
            params={"limit": limit}
        )
        return response.json()["events"]

    def get_recent_events(self, limit=50):
        """Get recent events across all nodes"""
        response = requests.get(
            f"{self.base_url}/api/events",
            params={"limit": limit}
        )
        return response.json()["events"]

    def get_stats(self):
        """Get system statistics"""
        response = requests.get(f"{self.base_url}/api/stats")
        return response.json()

# Usage example
client = TartClient("http://localhost:8080")

# Check health
health = client.get_health()
print(f"Status: {health['status']}")

# Get all nodes
nodes = client.get_nodes()
print(f"Connected nodes: {len(nodes)}")

# Get events from first node
if nodes:
    events = client.get_node_events(nodes[0]["node_id"], limit=10)
    print(f"Recent events: {len(events)}")

# Get system stats
stats = client.get_stats()
print(f"Best block: {stats['best_block']}")

JavaScript/TypeScript Integration

// tart-client.ts
interface Node {
  node_id: string;
  implementation_name: string;
  implementation_version: string;
  is_connected: boolean;
  event_count: number;
  last_seen_at: string;
}

interface Event {
  id: number;
  node_id: string;
  event_type: number;
  timestamp: string;
  data: any;
}

class TartClient {
  constructor(private baseUrl: string = "http://localhost:8080") {}

  async getHealth(): Promise<{ status: string }> {
    const response = await fetch(`${this.baseUrl}/api/health`);
    return response.json();
  }

  async getNodes(): Promise<Node[]> {
    const response = await fetch(`${this.baseUrl}/api/nodes`);
    const data = await response.json();
    return data.nodes;
  }

  async getNodeEvents(nodeId: string, limit: number = 100): Promise<Event[]> {
    const response = await fetch(
      `${this.baseUrl}/api/nodes/${nodeId}/events?limit=${limit}`
    );
    const data = await response.json();
    return data.events;
  }

  async getRecentEvents(limit: number = 50): Promise<Event[]> {
    const response = await fetch(
      `${this.baseUrl}/api/events?limit=${limit}`
    );
    const data = await response.json();
    return data.events;
  }

  async getStats(): Promise<{
    total_blocks_authored: number;
    best_block: number;
    finalized_block: number;
  }> {
    const response = await fetch(`${this.baseUrl}/api/stats`);
    return response.json();
  }
}

// Usage
const client = new TartClient("http://localhost:8080");

// Fetch nodes
const nodes = await client.getNodes();
console.log(`Connected nodes: ${nodes.length}`);

// Monitor stats
setInterval(async () => {
  const stats = await client.getStats();
  console.log(`Best block: ${stats.best_block}`);
}, 5000);

cURL Examples

# Health check
curl http://localhost:8080/api/health

# Get all nodes with pretty printing
curl -s http://localhost:8080/api/nodes | jq

# Get specific node
curl -s http://localhost:8080/api/nodes/a1b2c3d4... | jq

# Get node events (last 10)
curl -s "http://localhost:8080/api/nodes/a1b2c3d4.../events?limit=10" | jq

# Get recent events with filtering
curl -s "http://localhost:8080/api/events?limit=50" | jq '.events[] | select(.event_type == 11)'

# Get stats
curl -s http://localhost:8080/api/stats | jq

# Continuous monitoring
watch -n 1 'curl -s http://localhost:8080/api/stats | jq'

WebSocket Integration

JavaScript WebSocket Client

// tart-websocket.js
class TartWebSocket {
  constructor(url = "ws://localhost:8080/api/ws") {
    this.ws = new WebSocket(url);
    this.eventHandlers = new Map();

    this.ws.onopen = () => {
      console.log("Connected to TART telemetry");
      this.subscribe({ type: "All" });
    };

    this.ws.onmessage = (event) => {
      const message = JSON.parse(event.data);
      this.handleMessage(message);
    };

    this.ws.onerror = (error) => {
      console.error("WebSocket error:", error);
    };

    this.ws.onclose = () => {
      console.log("Disconnected from TART telemetry");
    };
  }

  subscribe(filter) {
    this.send({ type: "Subscribe", filter });
  }

  subscribeToNode(nodeId) {
    this.subscribe({ type: "Node", node_id: nodeId });
  }

  getRecentEvents(limit = 100) {
    this.send({ type: "GetRecentEvents", limit });
  }

  ping() {
    this.send({ type: "Ping" });
  }

  send(message) {
    if (this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify(message));
    }
  }

  on(eventType, handler) {
    this.eventHandlers.set(eventType, handler);
  }

  handleMessage(message) {
    const handler = this.eventHandlers.get(message.type);
    if (handler) {
      handler(message.data);
    }
  }
}

// Usage example
const client = new TartWebSocket("ws://localhost:8080/api/ws");

// Listen for new events
client.on("event", (data) => {
  console.log(`Event ${data.id} from ${data.node_id}:`, data.event);
});

// Listen for stats updates
client.on("stats", (data) => {
  console.log("Stats:", data);
});

// Subscribe to specific node
client.subscribeToNode("a1b2c3d4...");

// Keep-alive
setInterval(() => client.ping(), 30000);

Python WebSocket Client

import asyncio
import websockets
import json

class TartWebSocketClient:
    def __init__(self, url="ws://localhost:8080/api/ws"):
        self.url = url
        self.handlers = {}

    async def connect(self):
        """Connect and handle messages"""
        async with websockets.connect(self.url) as websocket:
            # Subscribe to all events
            await websocket.send(json.dumps({
                "type": "Subscribe",
                "filter": {"type": "All"}
            }))

            # Handle incoming messages
            async for message in websocket:
                data = json.loads(message)
                await self.handle_message(data)

    async def handle_message(self, message):
        """Route messages to handlers"""
        msg_type = message.get("type")
        if msg_type in self.handlers:
            await self.handlers[msg_type](message.get("data"))

    def on(self, event_type, handler):
        """Register event handler"""
        self.handlers[event_type] = handler

# Usage example
client = TartWebSocketClient()

# Event handler
async def on_event(data):
    print(f"Event {data['id']} from {data['node_id']}")
    print(f"Type: {data['event_type']}, Event: {data['event']}")

# Stats handler
async def on_stats(data):
    print(f"Active connections: {data['connections']['total']}")

# Register handlers
client.on("event", on_event)
client.on("stats", on_stats)

# Run
asyncio.run(client.connect())

Rust SDK Integration

use reqwest;
use serde_json::Value;

pub struct TartClient {
    base_url: String,
    client: reqwest::Client,
}

impl TartClient {
    pub fn new(base_url: impl Into<String>) -> Self {
        Self {
            base_url: base_url.into(),
            client: reqwest::Client::new(),
        }
    }

    pub async fn get_nodes(&self) -> Result<Vec<Value>, reqwest::Error> {
        let url = format!("{}/api/nodes", self.base_url);
        let response = self.client.get(&url).send().await?;
        let data: Value = response.json().await?;
        Ok(data["nodes"].as_array().unwrap().clone())
    }

    pub async fn get_node_events(
        &self,
        node_id: &str,
        limit: usize,
    ) -> Result<Vec<Value>, reqwest::Error> {
        let url = format!("{}/api/nodes/{}/events?limit={}",
            self.base_url, node_id, limit);
        let response = self.client.get(&url).send().await?;
        let data: Value = response.json().await?;
        Ok(data["events"].as_array().unwrap().clone())
    }

    pub async fn get_stats(&self) -> Result<Value, reqwest::Error> {
        let url = format!("{}/api/stats", self.base_url);
        let response = self.client.get(&url).send().await?;
        response.json().await
    }
}

// Usage
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = TartClient::new("http://localhost:8080");

    // Get nodes
    let nodes = client.get_nodes().await?;
    println!("Nodes: {}", nodes.len());

    // Get stats
    let stats = client.get_stats().await?;
    println!("Best block: {}", stats["best_block"]);

    Ok(())
}

Grafana Integration

Add TART as Prometheus Data Source

# prometheus.yml
scrape_configs:
  - job_name: 'tart-backend'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'
    scrape_interval: 15s

Example Grafana Queries

# Active connections over time
telemetry_active_connections

# Event throughput (events/sec)
rate(telemetry_events_received[1m])

# Event drop rate
rate(telemetry_events_dropped[5m])

# Average batch write duration
histogram_quantile(0.99, telemetry_batch_write_duration)

# Connection rejection rate
rate(telemetry_connections_rejected[5m])

PostgreSQL Direct Access

For advanced analytics, query the database directly:

-- Get event counts by type
SELECT
    event_type,
    COUNT(*) as count,
    MIN(timestamp) as first_seen,
    MAX(timestamp) as last_seen
FROM events
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY event_type
ORDER BY count DESC;

-- Get node connection history
SELECT
    node_id,
    implementation_name,
    COUNT(*) as connection_count,
    SUM(event_count) as total_events,
    MAX(last_seen_at) as last_active
FROM nodes
GROUP BY node_id, implementation_name
ORDER BY total_events DESC;

-- Get events per node per hour
SELECT
    hour,
    node_id,
    SUM(event_count) as events
FROM event_stats_hourly
WHERE hour > NOW() - INTERVAL '24 hours'
GROUP BY hour, node_id
ORDER BY hour DESC, events DESC;

-- Find nodes with high error rates
SELECT
    node_id,
    COUNT(*) FILTER (WHERE event_type IN (41, 44, 46, 92)) as error_events,
    COUNT(*) as total_events,
    ROUND(100.0 * COUNT(*) FILTER (WHERE event_type IN (41, 44, 46, 92)) / COUNT(*), 2) as error_rate_pct
FROM events
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY node_id
HAVING COUNT(*) FILTER (WHERE event_type IN (41, 44, 46, 92)) > 10
ORDER BY error_rate_pct DESC;

Building Custom Integrations

Build custom dashboards or integrations using the REST API and WebSocket:

<!DOCTYPE html>
<html>
<head>
    <title>TART Dashboard</title>
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body>
    <h1>JAM Network Telemetry</h1>
    <div id="stats"></div>
    <canvas id="eventChart"></canvas>

    <script>
        const API_BASE = "http://localhost:8080";
        const ws = new WebSocket("ws://localhost:8080/api/ws");

        // Display stats
        async function updateStats() {
            const response = await fetch(`${API_BASE}/api/stats`);
            const stats = await response.json();
            document.getElementById("stats").innerHTML = `
                <p>Best Block: ${stats.best_block}</p>
                <p>Finalized: ${stats.finalized_block}</p>
            `;
        }

        // WebSocket event stream
        ws.onmessage = (event) => {
            const message = JSON.parse(event.data);
            if (message.type === "event") {
                console.log("New event:", message.data);
                // Update your UI here
            }
        };

        // Update stats every 5 seconds
        setInterval(updateStats, 5000);
        updateStats();
    </script>
</body>
</html>

Event Types Reference

TART supports all 130 event types from the JIP-3 specification:

Event ID	Event Name	Description
0	Dropped	Events were dropped due to buffer overflow
10	Status	Periodic node status update (~2 sec)
11	BestBlockChanged	Node's best block changed
12	FinalizedBlockChanged	Latest finalized block changed
13	SyncStatusChanged	Node sync status changed
20-28	Connection Events	Peer connection lifecycle
40-47	Block Events	Block authoring/importing/execution
60-68	Block Distribution	Block announcements and transfers
80-84	Ticket Events	Safrole ticket generation/transfer
90-113	Guarantee Events	Work package guaranteeing pipeline
120-131	Availability Events	Shard requests and assurances
140-153	Bundle Events	Bundle recovery for auditing
160-178	Segment Events	Segment recovery and reconstruction
190-199	Preimage Events	Preimage distribution

Full specification: See JIP-3.md for complete event definitions.

Data Export

Export telemetry data for offline analysis:

# Export to JSON
curl -s "http://localhost:8080/api/events?limit=10000" | jq > events.json

# Export to CSV (using jq)
curl -s "http://localhost:8080/api/nodes" | \
  jq -r '.nodes[] | [.node_id, .implementation_name, .event_count, .is_connected] | @csv' \
  > nodes.csv

# Database dump for archival
docker-compose exec postgres pg_dump -U tart tart_telemetry > backup.sql

Architecture

System Components

┌─────────────────────────────────────────────────────────────┐
│                    TART Backend System                      │
│                                                             │
│  ┌────────────────┐      ┌──────────────┐                │
│  │   JAM Nodes    │      │   Operators  │                │
│  │  (up to 1024)  │      │  (Dashboard) │                │
│  └────────┬───────┘      └──────┬───────┘                │
│           │ TCP:9000            │ HTTP:8080              │
│           │                     │                        │
│  ┌────────▼─────────────────────▼────────┐              │
│  │      Telemetry Server (TCP + HTTP)    │              │
│  │  ┌─────────────────────────────────┐  │              │
│  │  │  Rate Limiter (100 evt/s/node)  │  │              │
│  │  └──────────────┬──────────────────┘  │              │
│  │                 │                      │              │
│  │  ┌──────────────▼──────────────────┐  │              │
│  │  │   Batch Writer (20ms timeout)   │  │              │
│  │  └──────────────┬──────────────────┘  │              │
│  └─────────────────┼──────────────────────┘              │
│                    │                                     │
│  ┌─────────────────▼─────────────────┐                  │
│  │   PostgreSQL Database (200 conn)  │                  │
│  │  ┌────────────────────────────┐   │                  │
│  │  │ Daily Partitioned Tables   │   │                  │
│  │  │ - events (by created_at)   │   │                  │
│  │  │ - nodes                    │   │                  │
│  │  │ - stats_cache              │   │                  │
│  │  └────────────────────────────┘   │                  │
│  └────────────┬──────────────────────┘                  │
│               │                                         │
│  ┌────────────▼──────────────────────┐                 │
│  │   Event Broadcaster (100k cap)    │                 │
│  │  - WebSocket distribution         │                 │
│  │  - Per-node channels              │                 │
│  │  - Ring buffer (10k events)       │                 │
│  └────────────┬──────────────────────┘                 │
│               │                                         │
│  ┌────────────▼──────────────────────┐                 │
│  │    REST API + WebSocket Server    │                 │
│  │  - /api/* endpoints               │                 │
│  │  - /metrics (Prometheus)          │                 │
│  └───────────────────────────────────┘                 │
└─────────────────────────────────────────────────────────────┘

Technology Stack

Language: Rust (Edition 2021)
Web Framework: Axum 0.7
Async Runtime: Tokio 1.40
Database: PostgreSQL 16 with SQLx
Serialization: Serde + JSON
Metrics: Prometheus
Logging: Tracing
TUI: Ratatui

Data Storage

PostgreSQL Schema:

nodes - Node metadata and connection tracking
events - Main event log (partitioned by day)
node_status - Extracted status events for dashboards
blocks - Block-related events
stats_cache - Pre-computed statistics
event_stats_hourly - Materialized view for hourly aggregations

Partitioning Strategy:

Daily partitions on events table
Automatic partition creation
Retention: 30 days (configurable)
Archive old partitions to S3/cold storage

Deployment

Production Deployment Checklist

Docker Production Deployment

# 1. Set production credentials
export POSTGRES_PASSWORD=$(openssl rand -base64 32)
export DATABASE_URL="postgres://tart:${POSTGRES_PASSWORD}@postgres:5432/tart_telemetry"

# 2. Create production docker-compose.override.yml
cat > docker-compose.override.yml <<EOF
version: '3.8'
services:
  postgres:
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    volumes:
      - /var/lib/tart/postgres:/var/lib/postgresql/data

  tart-backend:
    environment:
      - DATABASE_URL=${DATABASE_URL}
      - RUST_LOG=info
    restart: always
EOF

# 3. Deploy
docker-compose up -d

# 4. Verify
docker-compose logs -f tart-backend
curl http://localhost:8080/api/health

Kubernetes Deployment

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tart-backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tart-backend
  template:
    metadata:
      labels:
        app: tart-backend
    spec:
      containers:
      - name: tart-backend
        image: your-registry/tart-backend:latest
        ports:
        - containerPort: 9000
          name: telemetry
        - containerPort: 8080
          name: api
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: tart-secrets
              key: database-url
        - name: RUST_LOG
          value: "info"
        resources:
          requests:
            memory: "2Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"
        livenessProbe:
          httpGet:
            path: /api/health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /api/health/detailed
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10

Reverse Proxy Configuration

Nginx

# /etc/nginx/sites-available/tart-backend
upstream tart_api {
    server 127.0.0.1:8080;
}

server {
    listen 443 ssl http2;
    server_name telemetry.example.com;

    ssl_certificate /etc/letsencrypt/live/telemetry.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/telemetry.example.com/privkey.pem;

    # API endpoints
    location /api/ {
        proxy_pass http://tart_api;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # WebSocket upgrade
    location /api/ws {
        proxy_pass http://tart_api;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 86400;
    }

    # Metrics endpoint (restrict access)
    location /metrics {
        proxy_pass http://tart_api;
        allow 10.0.0.0/8;  # Internal network only
        deny all;
    }
}

Monitoring

Key Metrics to Monitor

Connection Metrics
- Active connections approaching 1024
- Connection rejection rate
- Average connection duration
Performance Metrics
- Event ingestion rate
- Batch write latency (p99 < 50ms)
- Database connection pool utilization
Error Metrics
- Event drop rate
- Rate-limited events
- Failed database writes
Resource Metrics
- Memory usage (< 4GB for 1024 nodes)
- CPU usage
- Database size growth
- Disk I/O

Alerting Rules

# prometheus-alerts.yml
groups:
  - name: tart-backend
    rules:
      - alert: HighConnectionCount
        expr: telemetry_active_connections > 900
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "TART approaching connection limit"
          description: "{{ $value }} active connections (limit: 1024)"

      - alert: HighEventDropRate
        expr: rate(telemetry_events_dropped[5m]) > 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High event drop rate detected"
          description: "Dropping {{ $value }} events/sec"

      - alert: DatabaseWriteSlow
        expr: histogram_quantile(0.99, telemetry_batch_write_duration) > 100
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Slow database writes"
          description: "p99 write latency: {{ $value }}ms"

      - alert: TartBackendDown
        expr: up{job="tart-backend"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "TART Backend is down"

Logging

Log Levels:

ERROR - Critical failures requiring immediate attention
WARN - Rate limiting, invalid timestamps, cache failures
INFO - Startup, connections, major events
DEBUG - Connection details, event processing
TRACE - Frame-by-frame protocol details

Example log configuration:

# Detailed logging for troubleshooting
RUST_LOG="tart_backend=debug,sqlx=info,tower_http=debug"

# Production logging
RUST_LOG="tart_backend=info,sqlx=warn,tower_http=warn"

Development

Building from Source

# Clone repository
git clone https://github.com/your-org/tart-backend.git
cd tart-backend

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Build in debug mode (fast compilation)
cargo build

# Build in release mode (optimized)
cargo build --release

# Run tests
cargo test

# Run with logging
RUST_LOG=debug cargo run

Project Structure

tart-backend/
├── src/
│   ├── main.rs                 # Application entry point
│   ├── lib.rs                  # Library exports
│   ├── server.rs               # TCP telemetry server (9000)
│   ├── api.rs                  # HTTP API server (8080)
│   ├── store.rs                # PostgreSQL data layer
│   ├── batch_writer.rs         # Event batching & writing
│   ├── event_broadcaster.rs    # WebSocket event distribution
│   ├── rate_limiter.rs         # Per-node rate limiting
│   ├── health.rs               # Health monitoring
│   ├── circuit_breaker.rs      # Fault tolerance
│   ├── retry.rs                # Retry logic with backoff
│   ├── node_id.rs              # Type-safe node identifiers
│   ├── events.rs               # Event definitions (130 types)
│   ├── types.rs                # JAM types & structures
│   ├── encoding.rs             # Binary encoding/decoding
│   ├── decoder.rs              # JIP-3 protocol decoder
│   └── bin/
│       └── tart-dash.rs        # Terminal UI dashboard
├── migrations/
│   ├── 001_postgres_schema.sql      # Initial schema
│   └── 002_performance_indexes.sql  # Performance indexes
├── tests/
│   ├── integration_tests.rs    # End-to-end tests
│   ├── events_tests.rs         # Event encoding tests
│   ├── api_tests.rs            # API endpoint tests
│   └── types_tests.rs          # Type system tests
├── docker-compose.yml          # Docker orchestration
├── Dockerfile                  # Container image
├── Cargo.toml                  # Rust dependencies
└── README.md                   # This file

Running Tests

# Unit tests (no database required) - can run in parallel
cargo test --lib --test types_tests --test events_tests --test error_tests --test encoding_tests

# Integration tests (requires PostgreSQL)
export TEST_DATABASE_URL="postgres://tart:tart_password@localhost:5432/tart_test"
# Create test database first:
# psql -U tart -h localhost -d postgres -c "CREATE DATABASE tart_test;"
# cargo sqlx migrate run
cargo test --test api_tests --test integration_tests --test optimized_server_tests -- --test-threads=1

# All tests together (serial execution for integration tests)
cargo test --lib --test types_tests --test events_tests --test error_tests --test encoding_tests
cargo test --test api_tests --test integration_tests --test optimized_server_tests -- --test-threads=1

# With output
cargo test -- --nocapture --test-threads=1

# Specific test
cargo test test_health_endpoint -- --test-threads=1

Important: Integration tests must run with --test-threads=1 to avoid database conflicts between parallel test executions.

Code Quality

# Format code
cargo fmt

# Lint code
cargo clippy -- -D warnings

# Security audit
cargo audit

# Check for unused dependencies
cargo machete

Troubleshooting

Backend Won't Start

Error: DATABASE_URL must be set

# Solution: Set environment variable
export DATABASE_URL="postgres://user:password@localhost:5432/tart_telemetry"
cargo run --release

Event Decoding Failures

Symptom: Logs show "Failed to decode event: Invalid boolean value: 28"

Cause: Event structure mismatch with wire format

Solution: Verify backend has latest JIP-3 compliant decoders

# Check version
curl http://localhost:8080/api/health

# Rebuild if needed
docker-compose build tart-backend

High Memory Usage

Symptom: Backend using >4GB RAM with <100 nodes

Possible Causes:

Event buffer overflow: Check telemetry_buffer_pending metric
Database connection leak: Check telemetry_database_pool_connections
WebSocket subscriber buildup: Check telemetry_broadcaster_subscribers

Solutions:

# Check metrics
curl http://localhost:8080/metrics | grep telemetry

# Restart if needed
docker-compose restart tart-backend

Database Performance Issues

Symptom: Slow queries, high write latency

Solutions:

-- Check for missing indexes
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY tablename;

-- Analyze query performance
EXPLAIN ANALYZE
SELECT * FROM events
WHERE node_id = 'abc...'
ORDER BY created_at DESC
LIMIT 100;

-- Reindex if needed
REINDEX TABLE events;

-- Update statistics
ANALYZE events;

Network Issues

Symptom: Nodes can't connect to telemetry port

Solutions:

# Check if port is listening
netstat -tlnp | grep 9000

# Check firewall
sudo ufw status
sudo ufw allow 9000/tcp

# Test connection
telnet localhost 9000

# Check Docker network
docker network inspect tart-backend_default

Performance Benchmarks

Single Node Performance

Operation	Latency (p50)	Latency (p99)
Event ingestion	0.3ms	0.8ms
Database write (single)	2ms	8ms
Database write (batch 100)	15ms	25ms
REST API query	5ms	15ms
WebSocket broadcast	1ms	3ms

Multi-Node Scalability

Nodes	Events/sec	Memory	CPU	DB Connections
10	1,000	0.5GB	10%	20
100	10,000	1.5GB	40%	100
500	50,000	3GB	80%	200
1024	100,000	5GB	95%	200

Load Testing

# Install k6 load testing tool
brew install k6  # macOS
# or: apt-get install k6  # Linux

# Run load test
k6 run tests/load-test.js

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built for the JAM (Join-Accumulate Machine) protocol
Specification: JIP-3 Telemetry

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
migrations		migrations
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.sh		build.sh
docker-compose.yml		docker-compose.yml
quick-start.sh		quick-start.sh
start-telemetry.sh		start-telemetry.sh
start-validators.sh		start-validators.sh
status.sh		status.sh

License

paritytech/jamtart

Folders and files

Latest commit

History

Repository files navigation

TART Backend

Table of Contents

Features

Core Capabilities

Performance Characteristics

Production Features

Quick Start

Using Docker (Recommended)

From Source

Terminal Dashboard

Installation

Prerequisites

Option 1: Docker Deployment (Production)

Option 2: Kubernetes Deployment

Option 3: Binary Installation

Configuration

Environment Variables

PostgreSQL Configuration

Linux System Tuning

API Reference

REST Endpoints

Health & Status

WebSocket Endpoints

Prometheus Metrics

Integrations

Node Integration (JAM Validator Nodes)

REST API Integration

Python Integration

JavaScript/TypeScript Integration

cURL Examples

WebSocket Integration

JavaScript WebSocket Client

Python WebSocket Client

Rust SDK Integration

Grafana Integration

Add TART as Prometheus Data Source

Example Grafana Queries

PostgreSQL Direct Access

Building Custom Integrations

Event Types Reference

Data Export

Architecture

System Components

Technology Stack

Data Storage

Deployment

Production Deployment Checklist

Docker Production Deployment

Kubernetes Deployment

Reverse Proxy Configuration

Nginx

Monitoring

Key Metrics to Monitor

Alerting Rules

Logging

Development

Building from Source

Project Structure

Running Tests

Code Quality

Troubleshooting

Backend Won't Start

Event Decoding Failures

High Memory Usage

Database Performance Issues

Network Issues

Performance Benchmarks

Single Node Performance

Multi-Node Scalability

Load Testing

License

Acknowledgments

About

Resources

Packages