Run more RL experiments. Wait less for GPUs.
-
Updated
Mar 21, 2026 - Python
Run more RL experiments. Wait less for GPUs.
Fully Autonomous AI Research System with Self-Evolution, built natively on Claude Code
Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.
A tool for examining GPU scheduling behavior.
PipelineScheduler optimizes workload distribution between servers and edge devices, setting optimal batch sizes to maximize throughput and minimize latency amid content dynamics and network instability. It also addresses resource contention with spatiotemporal inference scheduling to reduce co-location interference.
Topology-aware Kubernetes scheduler for multi-tenant, heterogeneous clusters
The GPU Optimizer for ML Models enhances GPU performance for machine learning. It offers advanced scheduling, real-time monitoring, and efficient resource management through a user-friendly web interface and robust API, integrating big data technologies for seamless data processing and model optimization. @NVIDIA
Universal experiment pipeline engine with PostgreSQL-backed scheduling, OOM-aware GPU allocation, and plugin architecture
# Fraud-Detection-Service ## The **fraud-detection-service** detects fraudulent orders and user activity. ### Endpoints - `GET /health` — service status - `POST /fraud/check` — check an order for fraud (sample) - `GET /fraud/:orderId` — get fraud status for an order (sample) ## Tracing This service reports telemetry
FastAPI service that serializes access to a single GPU host by leasing one Docker Compose stack at a time, with readiness checks, persistent state, and queued handoff.
Predictive job scheduler for heterogeneous compute — ACO + LSTM spike prediction + intent-aware routing. <10ms latency, 95%+ SLA adherence, 202 tests
A distributed CPU/GPU task scheduler for large-scale batch jobs across thousands of machines. Zero dependencies, sub-millisecond latency.
Scout optimal AWS compute for cluster workloads. Compare spot, capacity blocks, and on-demand pricing across regions and instance types. Auto-discover instances for CPU and GPU deployments with HPC schedulers.
Transparent suspend/resume runtime enabling preemptible GPU workloads via memory snapshotting, UVM paging, and execution state orchestration.
Design of a GPU Dynamic LLM Inference Task Scheduling Architecture Based on KubeAI
HPC research toolkit infrastructure for interfacing & analyzing LLMs (Kit is composed of: API gateway service, GPU scheduler, model servicer, and web interface)
Automate code improvement by detecting issues, fixing bugs, and simplifying code on separate branches before merging to main.
Add a description, image, and links to the gpu-scheduling topic page so that developers can more easily learn about it.
To associate your repository with the gpu-scheduling topic, visit your repo's landing page and select "manage topics."