Skip to content

nmdatar/infinitrain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

InfiniTrain - Distributed Job Training System

A lightweight, high-performance distributed job execution system built in Go. InfiniTrain enables you to distribute and execute jobs across multiple worker nodes with fault tolerance, monitoring, and simple management.

🎯 Project Goals

Primary Goal: Build a production-ready distributed job execution system that can:

  • Execute jobs across multiple worker nodes
  • Handle job failures gracefully with automatic retry
  • Provide real-time job status and monitoring
  • Scale horizontally by adding more worker nodes
  • Deploy easily with Docker Compose

MVP Timeline: 4-day rapid development cycle focusing on core functionality over advanced features.

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Job Clients   β”‚                           β”‚   CLI Tool      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                                             β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚      Job Scheduler         β”‚
               β”‚    (REST API Server)       β”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Redis Queue   β”‚
                    β”‚   (Job Store)   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚                       β”‚                       β”‚
β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
β”‚ Worker 1 β”‚           β”‚ Worker 2  β”‚           β”‚ Worker N  β”‚
β”‚ Node     β”‚           β”‚ Node      β”‚           β”‚ Node      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

  • Job Scheduler: Central coordinator that receives jobs and distributes them to available workers
  • Worker Nodes: Distributed execution units that pull and execute jobs
  • Redis Queue: Simple and reliable job queue and state storage
  • REST API: HTTP interface for job submission, monitoring, and management
  • CLI Tool: Command-line interface for job management

πŸš€ Features

Current Status: 🚧 In Development

  • Job Submission: Submit jobs via REST API with JSON specification
  • Distributed Execution: Execute jobs across multiple worker nodes
  • Job Types: Support for command execution, script running, and HTTP requests
  • Status Tracking: Real-time job status and progress monitoring
  • Worker Management: Automatic worker registration and health monitoring
  • Fault Tolerance: Automatic job retry and worker failure detection
  • CLI Tool: Command-line tool for job and worker management
  • Docker Deployment: Easy deployment with Docker Compose

Planned Features (Post-MVP)

  • Job dependencies and workflows (DAG support)
  • Advanced scheduling algorithms
  • Resource-aware job placement
  • Job templates and batch processing
  • Advanced monitoring and metrics
  • Authentication and authorization
  • Kubernetes deployment

πŸ“‹ Job Types Supported

  1. Command Jobs: Execute shell commands

    {
      "type": "command",
      "command": "echo 'Hello World'"
    }
  2. Script Jobs: Run bash/shell scripts

    {
      "type": "script",
      "script": "#!/bin/bash\necho 'Running script'\ndate"
    }
  3. HTTP Jobs: Make HTTP requests

    {
      "type": "http",
      "url": "https://api.example.com/webhook",
      "method": "POST"
    }

πŸ› οΈ Technology Stack

  • Language: Go 1.21+
  • Queue & Storage: Redis
  • Communication: HTTP REST API
  • Containerization: Docker & Docker Compose
  • Testing: Go built-in testing + manual integration tests

πŸƒβ€β™‚οΈ Quick Start

Note: This section will be updated as the project is built

Prerequisites

  • Go 1.21 or later
  • Docker and Docker Compose
  • Redis (or use Docker Compose setup)

Development Setup

# Clone the repository
git clone <repository-url>
cd infinitrain

# Initialize Go modules
go mod init infinitrain
go mod tidy

# Start Redis (using Docker)
docker run -d -p 6379:6379 redis:latest

# Run the scheduler
go run cmd/scheduler/main.go

# Run a worker (in another terminal)
go run cmd/worker/main.go

# Submit a test job
curl -X POST http://localhost:8080/api/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{"type":"command","command":"echo Hello InfiniTrain"}'

Production Deployment

# Deploy with Docker Compose
docker-compose up -d

# Scale workers
docker-compose up -d --scale worker=3

# View logs
docker-compose logs -f

πŸ“Š API Documentation

Job Submission

POST /api/v1/jobs
Content-Type: application/json

{
  "id": "job-123",
  "type": "command",
  "command": "echo 'Hello World'",
  "timeout": "5m",
  "retries": 3,
  "priority": 1,
  "tags": ["example", "test"]
}

Job Status

GET /api/v1/jobs/{job-id}

List Jobs

GET /api/v1/jobs

Worker Status

GET /api/v1/workers

System Health

GET /api/v1/health

πŸ§ͺ Testing

# Run unit tests
go test ./...

# Run integration tests
make test-integration

# Load testing
make load-test

πŸ“ˆ Performance Goals (MVP)

  • Concurrent Jobs: 100+ simultaneous jobs
  • Worker Nodes: Support for 5+ workers
  • Job Latency: < 1 minute for simple command jobs
  • Failure Detection: < 30 seconds for worker failures
  • Success Rate: 95%+ job completion rate

🀝 Contributing

This project is currently in rapid MVP development. Contribution guidelines will be added after the initial 4-day development cycle.

πŸ“œ License

MIT License - see LICENSE file for details

πŸ—“οΈ Development Timeline

  • Day 1: Foundation & Basic Job System ⏳
  • Day 2: Distribution & Communication
  • Day 3: Reliability & Management
  • Day 4: Polish & Deployment

Status: 🚧 Active Development - MVP Phase Last Updated: [Current Date] Next Milestone: Day 1 - Single Node Job Execution

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages