A full-stack platform for designing reinforcement learning environments, running GPU-backed training, and monitoring agents in real-time.
- About
- Features
- Screenshots
- Tech Stack
- Architecture
- Quick Start
- Installation
- Environment Variables Setup
- Usage
- API Documentation
- Development
- Deployment
- Contributing
- Roadmap
- FAQ
- License
- Acknowledgments
RL Studio is a comprehensive platform that brings together environment design, training orchestration, and real-time monitoring for reinforcement learning. Whether you're a researcher prototyping new environments or a practitioner training production models, RL Studio provides the tools you need in one unified interface.
- 🎨 Visual Environment Builder - Design complex RL environments without writing code
- 🚀 One-Click Training - Launch GPU-backed training jobs on AWS/GCP/Azure
- 📊 Real-time Monitoring - Track training progress with live metrics and visualizations
- 🔬 Advanced Analysis - Deep insights into rewards, trajectories, and policy behavior
- 📦 Code Generation - Export production-ready Gymnasium environments
- 📄 Paper Import - Import environment specs from research papers
- Figma-Style World Builder: Generic scene builder with reusable assets, prefabs, and templates
- Asset Library: Reusable components (characters, vehicles, props, tiles) that can be shared across projects
- Scene Graph System: Flexible entity-component system supporting any environment type
- Template System: Pre-built templates (Gridworld, Cliff Walking, Key & Door, Maze, Multi-Agent) ready to use
- Drag-and-Drop Interface: Design gridworld and continuous environments visually
- Multi-Agent Support: Create complex multi-agent scenarios
- Custom Geometry: Define custom shapes and obstacles
- Real-time Preview: See your environment as you build it
- GPU-Backed Training: Launch training jobs on AWS/GCP/Azure via SkyPilot
- Multiple Algorithms: Support for PPO, DQN, A2C, BC, Imitation Learning, and more
- Hyperparameter Tuning: Built-in suggestions based on environment characteristics
- Spot Instance Support: 70% cost savings with automatic spot instance management
- Live Metrics: Real-time training metrics visualization
- Reward Decomposition: Understand which rules contribute most to rewards
- Trajectory Visualization: View agent paths and detect attractors
- Policy Entropy: Measure exploration vs exploitation
- Termination Analysis: Understand why episodes end
- Code Generation: Auto-generate production-ready Gymnasium environments
- Paper Import: Import environment specs from arXiv/blog URLs via Firecrawl
- Export Everything: Generate complete project bundles with training scripts
- Code Review: AI-powered code review and validation
- Framework: TanStack Start (React SSR)
- Language: TypeScript
- Styling: Tailwind CSS
- 3D Graphics: Three.js, React Three Fiber
- State Management: Convex (realtime database)
- Charts: Recharts
- API: FastAPI (Python) with GraphQL support
- GraphQL: Strawberry GraphQL (type-safe, efficient queries)
- Database: Convex (realtime DB)
- Training: SkyPilot (GPU orchestration)
- RL Libraries: Stable-Baselines3, Gymnasium, PyTorch
- Analysis: NumPy, SciPy, scikit-learn
- Frontend Hosting: Netlify
- Backend Hosting: Google Cloud Run / AWS ECS / Railway
- Database: Convex Cloud
- Training Jobs: AWS/GCP/Azure (via SkyPilot)
┌─────────────────────────────────────────────────────────────┐
│ Frontend (React) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Editor │ │ Visualizer │ │ Training │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Convex (Realtime DB) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Environments │ │ Runs │ │ Metrics │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Backend API (FastAPI) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Rollout │ │ Analysis │ │ Codegen │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Training Infrastructure (SkyPilot) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ AWS │ │ GCP │ │ Azure │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
Just run one command and add your API keys!
# 1. Run automated setup
python setup.py
# 2. Add your API keys to .env files (see below)
# 3. Start the servers
cd backend && source venv/bin/activate && python main.py # Terminal 1
npm run dev # Terminal 2The setup script automatically:
- ✅ Installs all dependencies
- ✅ Creates virtual environments
- ✅ Sets up infrastructure
- ✅ Creates .env templates
You only need to add API keys! See Environment Variables Setup below.
If you prefer manual setup:
- Clone the repository
git clone https://github.com/yourusername/rl-studio.git
cd RL-Studio💡 Note: This is an open-source project. You'll be setting up your own instance with your own API keys and credentials.
- Install dependencies
# Install frontend dependencies
npm install
# Install backend dependencies
cd backend
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cd ..- Set up Convex (Your Own Deployment)
# This creates YOUR OWN Convex deployment (not shared with production)
npx convex devThis will:
- Create a Convex account if needed (free tier available)
- Set up your local development environment
- Generate YOUR deployment URL (e.g., https://your-deployment.convex.cloud)
Important: Each user gets their own Convex deployment. This is required for open-source use.
- Seed Local Development Data
# Seed sample environments, runs, and data for local development
npx convex run seed_local_data:seedAllThis creates sample data so you can develop without needing production data.
- Configure Infrastructure (Optional)
RL Studio supports multiple cloud providers and local development:
Quick Start (Local Only - No Cloud Required):
# Everything runs locally - no cloud setup needed!
# Just set your Convex URL from step 3
export CONVEX_URL=https://your-deployment.convex.cloudFor Cloud Storage (S3/GCS/Azure):
See INFRASTRUCTURE_SETUP.md for detailed instructions. It's very easy - just set environment variables!
Quick Example (AWS S3):
export STORAGE_PROVIDER=s3
export S3_BUCKET_NAME=rl-studio-models
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secretThe framework auto-detects your provider from credentials - no code changes needed!
- Configure environment variables
⚠️ Important: Since this is an open-source project, you'll need to set up your own environment variables. The.envfiles are not included in the repository for security reasons.
Create a .env file in the root directory of the project:
# Create the file
touch .envThen add the following content (replace with your own values):
# ============================================
# CONVEX CONFIGURATION (REQUIRED)
# ============================================
# Get this URL from running: npx convex dev
# It will look like: https://your-deployment-name.convex.cloud
VITE_CONVEX_URL=https://your-deployment-name.convex.cloud
# ============================================
# OPENAI API (OPTIONAL - For Code Generation)
# ============================================
# Get your API key from: https://platform.openai.com/api-keys
# Leave empty if you don't need code generation features
OPENAI_API_KEY=sk-your-api-key-here
# ============================================
# FIRECRAWL API (OPTIONAL - For Paper Import)
# ============================================
# Get your API key from: https://firecrawl.dev
# Leave empty if you don't need paper import features
FIRECRAWL_API_KEY=your-firecrawl-api-key
# ============================================
# BACKEND SERVICE URLs (OPTIONAL)
# ============================================
# For local development, these default to localhost:8000
# For production, set these to your deployed backend URLs
VITE_ROLLOUT_SERVICE_URL=http://localhost:8000
VITE_API_URL=http://localhost:8000
VITE_TRAINING_SERVICE_URL=http://localhost:8000
# ============================================
# SENTRY (OPTIONAL - For Error Tracking)
# ============================================
# Get your DSN from: https://sentry.io
# Leave empty if you don't need error tracking
VITE_SENTRY_DSN=your-sentry-dsnCreate a .env file in the backend/ directory:
# Create the file
cd backend
touch .env
cd ..Then add the following content:
# ============================================
# OPENAI API (OPTIONAL - For Code Generation)
# ============================================
# Same as in root .env, or leave empty
OPENAI_API_KEY=sk-your-api-key-here
# ============================================
# AWS CREDENTIALS (OPTIONAL - For GPU Training)
# ============================================
# Get these from: https://console.aws.amazon.com/iam/
# Required only if you want to launch GPU training jobs
AWS_ACCESS_KEY_ID=your-aws-access-key-id
AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key
AWS_DEFAULT_REGION=us-east-1
# ============================================
# CONVEX URL (OPTIONAL)
# ============================================
# Usually same as VITE_CONVEX_URL from root .env
CONVEX_URL=https://your-deployment-name.convex.cloud
# ============================================
# SERVER CONFIGURATION (OPTIONAL)
# ============================================
# These have sensible defaults, only change if needed
# PORT=8000
# HOST=0.0.0.0
# DEBUG=false
# CORS_ORIGINS=*For Convex-specific environment variables (like Firecrawl API key for Convex functions), use the Convex CLI:
# Set Firecrawl API key for Convex functions
npx convex env set FIRECRAWL_API_KEY your-firecrawl-key
# Set CodeRabbit API key (if using code review features)
npx convex env set CODERABBIT_API_KEY your-coderabbit-key| Variable | Location | Required | Description |
|---|---|---|---|
VITE_CONVEX_URL |
Root .env |
✅ Yes | Your Convex deployment URL |
OPENAI_API_KEY |
Root .env & backend/.env |
❌ No | For AI code generation |
FIRECRAWL_API_KEY |
Root .env & Convex |
❌ No | For paper import feature |
AWS_ACCESS_KEY_ID |
backend/.env |
❌ No | For GPU training jobs |
AWS_SECRET_ACCESS_KEY |
backend/.env |
❌ No | For GPU training jobs |
VITE_ROLLOUT_SERVICE_URL |
Root .env |
❌ No | Backend URL (defaults to localhost) |
VITE_API_URL |
Root .env |
❌ No | Backend API URL (defaults to localhost) |
- Never commit
.envfiles to version control (they're already in.gitignore) - Use different keys for development and production
- Rotate API keys regularly for security
- Use environment-specific files:
.env.localfor local overrides,.env.productionfor production
Problem: "VITE_CONVEX_URL is not set"
Solution: Run npx convex dev to get your Convex URL, then add it to .env
Problem: "Backend connection failed"
Solution: Make sure VITE_ROLLOUT_SERVICE_URL points to your running backend (default: http://localhost:8000)
Problem: "OpenAI API key invalid"
Solution: Check that your API key starts with sk- and is not a placeholder value
- Seed the database (REQUIRED for Templates & Assets)
# 1. Get your user_id (see methods below)
# 2. Run the seeding script
./seed_database.sh <your_user_id> [optional_project_id]8 Templates:
- Basic Gridworld, Cliff Walking, Key & Door, Maze, Multi-Agent Grid
- 2D Continuous Navigation, 3D Navigation, Driving Simulator
25+ Assets:
- Grid tiles: Agent, Goal, Wall, Key, Door, Trap, Checkpoint, etc.
- Vehicles: Car, Truck
- 3D assets: 3D Agent, 3D Obstacle, Platform
- Continuous 2D assets: Continuous Agent, Continuous Obstacle
- Templates: Click "New Environment" → "From Template" → Look for "Templates from Library" section (top of dialog)
- Assets: When editing an environment → Top of canvas shows Asset Palette with colored buttons
Method 1: Browser Console
// Open browser console (F12) and run:
localStorage.getItem('rl_studio_user_id')Method 2: Convex Dashboard
- Go to https://dashboard.convex.dev
- Select your project → "Data" → "users" table
- Copy any user's
_id
Method 3: From Existing Environment
- Open any environment in the app
- Check its
ownerIdin Convex dashboard
Using Python:
cd backend
source venv/bin/activate
python -m rl_studio.api.seed_database <user_id> [project_id]Using API:
# Seed everything
curl -X POST http://localhost:8000/api/admin/seed/all \
-H "Content-Type: application/json" \
-d '{"created_by": "<user_id>", "project_id": "<project_id>"}'Verify setup:
python -m rl_studio.api.verify_setup <user_id> <project_id>- Start the development servers
# Terminal 1: Start backend
cd backend
source venv/bin/activate
python main.py
# Terminal 2: Start frontend
npm run dev- Open your browser
Navigate to http://localhost:3000
📝 For Open Source Contributors: This section explains how to configure environment variables for your own RL Studio instance.
Since RL Studio is open source, you need to set up your own environment variables. Here's a comprehensive guide:
- Clone the repository
- Install dependencies (Node.js & Python)
- Set up Convex account (
npx convex dev) - Create root
.envfile withVITE_CONVEX_URL - Create
backend/.envfile (optional, for training features) - Set Convex environment variables (optional, for paper import)
Convex is required for the application to work. It's free to get started:
# This will create a Convex account and deployment
npx convex devThis will output a URL like:
https://your-deployment-name.convex.cloud
Add this to your root .env file:
VITE_CONVEX_URL=https://your-deployment-name.convex.cloudIf you want to use AI-powered code generation:
- Get an API key from OpenAI Platform
- Add to both
.envfiles:
Root .env:
OPENAI_API_KEY=sk-proj-...your-key-herebackend/.env:
OPENAI_API_KEY=sk-proj-...your-key-hereIf you want to import environments from research papers:
- Get an API key from Firecrawl
- Add to root
.env:
FIRECRAWL_API_KEY=fc-...your-key-here- Also set in Convex:
npx convex env set FIRECRAWL_API_KEY fc-...your-key-hereIf you want to launch GPU training jobs:
- Create an AWS account
- Create an IAM user with programmatic access
- Add credentials to
backend/.env:
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_DEFAULT_REGION=us-east-1For production deployments, set your backend URLs in root .env:
VITE_ROLLOUT_SERVICE_URL=https://your-backend-url.com
VITE_API_URL=https://your-backend-url.com
VITE_TRAINING_SERVICE_URL=https://your-backend-url.comFor local development, these default to http://localhost:8000.
# REQUIRED - Get from: npx convex dev
VITE_CONVEX_URL=https://your-deployment-name.convex.cloud# ============================================
# REQUIRED
# ============================================
VITE_CONVEX_URL=https://your-deployment-name.convex.cloud
# ============================================
# OPTIONAL - Code Generation
# ============================================
OPENAI_API_KEY=sk-proj-your-key-here
# ============================================
# OPTIONAL - Paper Import
# ============================================
FIRECRAWL_API_KEY=fc-your-key-here
# ============================================
# OPTIONAL - Backend URLs (for production)
# ============================================
# VITE_ROLLOUT_SERVICE_URL=https://your-backend.com
# VITE_API_URL=https://your-backend.com
# VITE_TRAINING_SERVICE_URL=https://your-backend.com# ============================================
# OPTIONAL - Code Generation
# ============================================
OPENAI_API_KEY=sk-proj-your-key-here
# ============================================
# OPTIONAL - AWS for GPU Training
# ============================================
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_DEFAULT_REGION=us-east-1
# ============================================
# OPTIONAL - Convex (usually same as VITE_CONVEX_URL)
# ============================================
CONVEX_URL=https://your-deployment-name.convex.cloud
# ============================================
# OPTIONAL - Server Config
# ============================================
# PORT=8000
# HOST=0.0.0.0
# DEBUG=falseAfter setting up your .env files, verify everything works:
# 1. Check Convex connection
npx convex dev
# Should show: "✅ Convex functions are ready"
# 2. Start backend
cd backend
source venv/bin/activate
python main.py
# Should show: "✅ Loaded .env from backend/.env"
# 3. Start frontend
npm run dev
# Should open at http://localhost:3000| Issue | Solution |
|---|---|
VITE_CONVEX_URL is not set |
Run npx convex dev and add the URL to .env |
Backend connection failed |
Make sure backend is running on port 8000 |
OpenAI API key invalid |
Check that your key starts with sk- |
AWS credentials not found |
Only needed for GPU training, can be skipped for local dev |
- Never commit
.envfiles - They're already in.gitignore - Use different keys for dev/prod - Create separate
.env.localand.env.production - Rotate keys regularly - Especially for production
- Use environment variables in CI/CD - Don't hardcode secrets
- Click "New Environment" in the dashboard
- Choose environment type (Grid or Continuous)
- Design your environment:
- Add agents, goals, obstacles
- Define reward rules
- Set termination conditions
- Configure action/observation spaces
- Save your environment
- Open your environment
- Go to "Rollout Preview" tab
- Configure rollout settings (max steps, policy)
- Click "Run Rollout"
- View results and analysis in real-time
- Open your environment
- Click "Launch Training"
- Configure training settings:
- Algorithm: PPO, DQN, A2C, BC, Imitation, or Random
- Hyperparameters: Learning rate, gamma, total steps, etc.
- GPU: Select GPU type (A10:1 recommended)
- Parallel Environments: Number of parallel envs
- Click "Launch Training"
- Monitor progress in real-time
After running rollouts, use the "RL Analysis" tab to:
- Reward Decomposition: See which rules contribute most to rewards
- Trajectory Visualization: View agent paths and detect attractors
- Policy Entropy: Measure exploration vs exploitation
- Termination Analysis: Understand why episodes end
RL Studio uses GraphQL as the primary API for all core operations. The GraphQL API provides type-safe, efficient queries and mutations with a single endpoint.
Status: ✅ Fully Operational - All core operations use GraphQL. The frontend and backend are fully migrated to GraphQL.
GraphQL Endpoint: http://localhost:8000/graphql
GraphQL Playground: Visit http://localhost:8000/graphql in your browser for an interactive query explorer with auto-completion and documentation.
- ✅ Type-Safe: Compile-time type checking with Strawberry GraphQL
- ✅ Efficient: Fetch only the fields you need, reducing payload size
- ✅ Single Endpoint: All operations through one endpoint
- ✅ Introspection: Auto-generated documentation and schema exploration
- ✅ Flexible: Combine multiple queries in a single request
List Assets:
query {
assets(filter: { mode: "grid" }) {
id
name
assetType
geometry
visualProfile
}
}Get Single Asset:
query {
asset(id: "asset_id_here") {
id
name
geometry
visualProfile
physicsProfile
behaviorProfile
}
}Create Asset:
mutation {
createAsset(input: {
name: "My Asset"
assetTypeKey: "tile"
visualProfile: "{\"color\": \"#FF0000\"}"
}) {
id
name
}
}List Scenes:
query {
scenes(filter: { search: "gridworld" }) {
id
name
createdAt
createdBy
}
}Run Rollout:
mutation {
runRollout(input: {
envSpec: "{\"world\": {\"width\": 10, \"height\": 10}}"
policy: "random"
maxSteps: 100
}) {
success
totalReward
episodeLength
steps {
stepNumber
reward
done
}
}
}Launch Training:
mutation {
launchTraining(input: {
runId: "run_123"
envSpec: "{\"world\": {\"width\": 10}}"
config: "{\"algorithm\": \"ppo\"}"
useManagedJobs: true
}) {
id
status {
status
jobId
}
}
}Training Status:
query {
trainingStatus(jobId: "job_123") {
status
jobId
progress
logs
}
}Research Operations:
# Hyperparameter Sweep
mutation {
generateHyperparameterSweep(input: {
algorithm: "ppo"
envSpec: "{...}"
baseConfig: "{...}"
searchSpace: "{...}"
searchType: "grid"
nTrials: 10
}) {
success
nTrials
trials {
trialNumber
hyperparameters
}
}
}
# Compare Runs
mutation {
compareRuns(input: {
runResults: "[{...}]"
metric: "mean_reward"
alpha: 0.05
}) {
success
comparison {
metric
bestRun
worstRun
statisticalTest
}
}
}
# Model Versioning
query {
listCheckpoints(runId: "run_123") {
checkpointName
path
createdAt
}
listModelVersions(runId: "run_123") {
versionName
checkpointName
tags
modelPath
}
}W&B Integration:
# Test Connection
mutation {
testWandbConnection(input: {
apiKey: "your-api-key"
}) {
success
message
wandbAuthenticated
}
}
# List Runs
query {
listWandbRuns(project: "rl-studio", apiKey: "your-key") {
id
name
state
url
}
}
# Get Run Details
query {
getWandbRun(runId: "run_id", project: "rl-studio", apiKey: "your-key") {
runId
runName
metrics
summary
url
}
}Some operations still use REST endpoints (not yet migrated to GraphQL):
/ws/rollout- WebSocket endpoint for real-time rollout streaming (GraphQL doesn't support WebSocket subscriptions)/api/codegen/*- Code generation utilities/api/templates/*- Template management/api/analysis/*- Analysis operations/api/verification/*- Verification utilities/api/benchmarks/*- Benchmark operations/api/infrastructure/*- Infrastructure management/api/admin/*- Admin operations/api/ingestion/*- Data ingestion
API Documentation:
- GraphQL Playground: http://localhost:8000/graphql (interactive query explorer with auto-completion)
- Swagger UI: http://localhost:8000/docs (for REST endpoints)
- ReDoc: http://localhost:8000/redoc
rl-studio/
├── app/ # Frontend (TanStack Start)
│ ├── components/ # React components
│ ├── lib/ # Utilities and clients
│ └── routes/ # Page routes
├── backend/ # Python backend
│ ├── rl_studio/ # Core RL logic
│ │ ├── api/ # API layer
│ │ │ ├── graphql/ # GraphQL API (primary)
│ │ │ └── ... # Business logic functions (used by GraphQL)
│ │ ├── rollout/ # Rollout simulator
│ │ ├── training/ # Training orchestration
│ │ ├── analysis/ # RL analysis (NumPy/SciPy)
│ │ └── codegen/ # Code generation
│ └── main.py # Backend entry point
├── convex/ # Convex backend
│ ├── environments.ts # Environment schema
│ ├── runs.ts # Training runs
│ └── metrics.ts # Training metrics
└── training/ # Training scripts
└── train.py # Main training script
# Frontend
npm run dev # Start dev server
npm run build # Build for production
npm run lint # Run linter
npm run test # Run tests
npm run format # Format code
# Backend
cd backend
source venv/bin/activate
python main.py # Start FastAPI server
python -m pytest # Run tests (if configured)
# Convex
npx convex dev # Start Convex dev server
npx convex deploy # Deploy to production- Frontend: ESLint + Prettier
- Backend: Black (Python formatter)
- TypeScript: Strict mode enabled
- Python: Type hints required
RL Studio now includes a powerful, generic world builder inspired by Figma's component system. This allows you to create any type of RL environment using reusable building blocks.
- Assets: Reusable definitions (characters, vehicles, props, tiles, etc.)
- Prefabs: Composed assets made from other assets (e.g., "road intersection")
- Scenes: A graph of entities (instances of assets/prefabs) with transforms and components
- Templates: Saved scenes that users can clone (e.g., "10×10 Grid World")
- Components: Attach semantics to entities (physics, render info, RL agent, reward zone)
- RL Config: How the scene is interpreted as an RL environment (observation, actions, rewards, episode config)
The system uses a flexible schema stored in Convex:
- assetTypes: Categories of assets (tile, character, vehicle, prop, prefab)
- assets: Reusable asset definitions with visual, physics, and behavior profiles
- scenes: Scene metadata (name, mode, environment settings)
- sceneVersions: Versioned snapshots of scene graphs and RL configs
- templates: Pre-built scenes ready to instantiate
GraphQL (Primary API):
- Assets:
assets,asset,createAsset,updateAsset,deleteAssetmutations/queries - Scenes:
scenes,scene,createScene,updateScene,createSceneVersionmutations/queries - Training:
launchTraining,stopTraining,trainingStatusmutations/queries - Rollout:
runRolloutmutation - Research:
generateHyperparameterSweep,compareRuns,calculateConfidenceInterval,calculateEffectSize, model versioning, W&B/MLflow operations
Additional Services (REST):
- Template Service (
/api/templates): Template management - Compile Service (
/api/compile): Scene compilation - WebSocket (
/ws/rollout): Real-time rollout streaming
- Caching: In-memory TTL cache for assets and templates (5-10 minute TTL)
- Cache Invalidation: Automatic cache invalidation on create/update/delete operations
- Pagination: Support for limiting and offsetting results in list queries
- Asset Safety: Validation prevents deleting assets that are referenced in scenes
The system comes with pre-built assets and templates:
Assets:
- Grid tiles: Wall, Agent, Goal, Key, Door, Trap, Checkpoint, Moving Obstacle, Floor, Spawn Point
- All assets include visual profiles (colors, sizes), physics profiles (colliders, mass), and behavior profiles
Templates:
- Basic Gridworld: Simple 10×10 grid with agent and goal
- Cliff Walking: Agent must avoid cliff cells
- Key & Door Grid: Sequential task requiring key collection
- Maze Generator: Randomly generated mazes
- Multi-Agent Grid (Cooperative): Multiple agents working together
- 2D Continuous Navigation: Continuous 2D space with obstacles
- 3D Navigation: 3D environment with spatial reasoning
- Driving Simulator: Vehicle control with steering and acceleration
# List available templates
templates = await list_templates(mode="grid", category="grid")
# Instantiate a template into your project
scene = await instantiate_template(
template_id="template_id",
project_id="project_id",
name="My Gridworld"
)
# Get scene with active version
scene_data = await get_scene(scene_id)
scene_graph = scene_data["activeVersion"]["sceneGraph"]
rl_config = scene_data["activeVersion"]["rlConfig"]
# Compile to runtime spec
runtime_spec = await compile_scene_version(scene_data["activeVersion"]["_id"])
# Clone an asset to a project scope
cloned_asset = await clone_asset(asset_id, project_id="project_id")
# Create a template from a scene version
template = await create_template(
name="My Template",
sceneVersionId=version_id,
category="grid",
tags=["custom"],
isPublic=False
)Assets can be:
- Created: Define new reusable assets with visual, physics, and behavior profiles
- Cloned: Copy global assets to project scope or duplicate within same scope
- Updated: Modify asset properties while maintaining references
- Deleted: Safe deletion with validation (prevents deletion if referenced in scenes)
- Searched: Filter by type, mode, tags, or project
Templates can be:
- Created: Save any scene version as a reusable template
- Listed: Filter by mode, category, or public/private status
- Instantiated: Clone templates into new scenes with copied scene graphs and RL configs
- Shared: Mark templates as public for team-wide access
For more details, see the API documentation and code in:
- GraphQL API (Primary):
backend/rl_studio/api/graphql/- Complete GraphQL implementation- Types:
backend/rl_studio/api/graphql/types/- GraphQL type definitions - Resolvers:
backend/rl_studio/api/graphql/resolvers/- Query/mutation resolvers - Schema:
backend/rl_studio/api/graphql/schema.py- Main schema definition
- Types:
- Templates:
backend/rl_studio/api/templates.py- Template management (REST) - Compile:
backend/rl_studio/api/compile.py- Scene compilation (REST) - Caching:
backend/rl_studio/api/cache.py- Shared caching layer
npm run build
# Deploy dist/ to Netlify
# Set environment variable: VITE_CONVEX_URLcd backend
./deploy-gcp.shOr manually:
gcloud run deploy rl-studio-backend \
--source . \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars "CONVEX_URL=$CONVEX_URL,PORT=8080"cd backend
./aws-deploy.sh- Go to Railway
- New Project → Deploy from GitHub
- Select your repo
- Add Service → Empty Service
- Root Directory:
backend - Build Command:
pip install -r requirements.txt - Start Command:
python main.py - Add environment variable:
CONVEX_URL
npx convex deployTraining jobs are automatically deployed via SkyPilot when launched from the UI. No manual deployment needed!
- A10:1 GPU: ~$1.00/hour (on-demand) or ~$0.30/hour (spot)
- 1M training steps: ~2-4 hours = ~$2-4 (on-demand) or ~$0.60-1.20 (spot)
💡 Recommendation: Enable spot instances for 70% cost savings!
- Convex: Free tier available, then usage-based
- Frontend Hosting: Free on Netlify (hobby tier)
- Backend Hosting: ~$5-20/month (depending on traffic)
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
git checkout -b feature/amazing-feature
- Make your changes
- Commit your changes
git commit -m 'Add some amazing feature' - Push to the branch
git push origin feature/amazing-feature
- Open a Pull Request
See Development section for setup instructions.
Please read our Code of Conduct before contributing.
- Multi-Agent Training: Enhanced support for multi-agent RL
- Custom Algorithms: Plugin system for custom RL algorithms
- Distributed Training: Support for distributed training across multiple GPUs
- Environment Templates: Pre-built templates for common RL tasks
- Collaboration Features: Real-time collaboration on environments
- Model Zoo: Pre-trained models and benchmarks
- Mobile App: Mobile app for monitoring training jobs
- API SDK: Official SDKs for Python, JavaScript, and Rust
See our Issues for more planned features.
Q: Is RL Studio free to use?
A: Yes! RL Studio is open source and free. You only pay for cloud infrastructure (GPU training jobs, hosting, etc.).
Q: What cloud providers are supported?
A: AWS, Google Cloud Platform, and Microsoft Azure are all supported via SkyPilot.
Q: Can I use RL Studio without cloud training?
A: Yes! You can use RL Studio for environment design and local rollouts without any cloud setup.
Q: What Python version is required?
A: Python 3.9 or higher.
Q: Can I deploy on my own infrastructure?
A: Yes! RL Studio is fully open source and can be deployed anywhere.
Q: How do I add a custom RL algorithm?
A: Currently, you can modify the training scripts. We're working on a plugin system for easier extensibility.
Q: How much do GPU training jobs cost?
A: See Cost Estimates section. Costs vary by provider and instance type.
Q: Can I use spot instances?
A: Yes! SkyPilot automatically manages spot instances for cost savings.
Q: How long do training jobs take?
A: Depends on your environment and hyperparameters. Typical jobs range from 1-4 hours for 1M steps.
This project is licensed under the MIT License - see the LICENSE file for details.
RL Studio is built with amazing open-source tools and libraries:
- Stable-Baselines3 - RL algorithms
- Gymnasium - RL environments
- SkyPilot - GPU orchestration
- TanStack Start - React framework
- Convex - Realtime database
- FastAPI - Python API framework
- Three.js - 3D graphics
Special thanks to all contributors who have helped make RL Studio better!
Made with ❤️ for the RL community



