🧪 Training Farm Wave 4+5 — Evolution Tracker

## 🧠 HSLM Training Farm — Live Status

### 📊 Farm Overview (2026-03-13)

| Account | Slots | Training | Building | Queued | Crashed | Fixed |
|---------|-------|----------|----------|--------|---------|-------|
| Primary | 2/25 | 2 ✅ | 0 | 0 | 0 | - |
| Farm-2 | 8/25 | 0→8 🔧 | 8 (rebuilding) | 0 | 0 | 8 just fixed |
| Farm-3 | 16/25 | 1 (R20) | 6 | 7 | 0 | 2 just fixed |
| **Total** | **26/75** | **3** | **14** | **7** | **0** | **10 fixed** |

### 🔧 Actions Taken This Cycle
1. ✅ Fixed `startCommand=null` on ALL services (was referencing deleted `entrypoint-train.sh`)
2. ✅ Set HSLM_* env vars on all 8 farm-2 services  
3. ✅ Redeployed all 10 crashed services (8 farm-2 + 2 farm-3)
4. ✅ Fixed primary services startCommand for future redeploys

### 📈 Active Training Metrics
| Service | Account | Step | AvgLoss | PPL | LR | Tok/s |
|---------|---------|------|---------|-----|-----|-------|
| hslm-v11 | primary | 21K | 6.12 | ~455 | 1.41e-4 | 12,113 |
| hslm-train | primary | 4.7K | 6.18 | ~483 | 2.82e-4 | 12,830 |
| hslm-r20 | farm-3 | ? | ? | ? | ? | ? |

### 🏆 Best Known Result
**v4R: PPL=125 (Loss=4.83)** — Adam, LR=3e-4, cosine, 100K steps

### 🎯 Goal: 75/75 slots utilized (3 accounts × 25)

### Wave 4+5 Experiment Matrix
| Run | Optimizer | LR | Schedule | Special | Account | Status |
|-----|-----------|-----|----------|---------|---------|--------|
| R5 | adam | 3e-4 | cosine | baseline | farm-2 | 🔄 rebuilding |
| R6 | adamw | 1e-3 | cosine | — | farm-2 | 🔄 rebuilding |
| R10 | lamb | 3e-4 | cosine | ga=2 | farm-2 | 🔄 rebuilding |
| R11 | adam | 3e-4 | cosine | restarts | farm-2 | 🔄 rebuilding |
| R12 | adam | 3e-4 | cosine | ga=4 | farm-2 | 🔄 rebuilding |
| R13 | lamb | 1e-3 | cosine | ga=4 | farm-2 | 🔄 rebuilding |
| R14 | adam | 5e-4 | cosine | higher LR | farm-3 | 🔄 redeployed |
| R15 | adamw | 5e-4 | cosine | WD=0.05 | farm-3 | ⏳ queued |
| R16 | adam | 3e-4 | sacred | φ-scale | farm-3 | ⏳ queued |
| R17 | adam | 3e-4 | cosine | adaptive-sparsity | farm-3 | ⏳ queued |
| R18 | adam | 3e-4 | cosine | ternary-schedule | farm-2 | 🔄 rebuilding |
| R19 | lamb | 3e-3 | cosine | ga=8 | farm-2 | 🔄 rebuilding |
| R20 | adam | 3e-4 | cosine | full-ternary | farm-3 | ✅ SUCCESS |
| R21 | lamb | 3e-4 | cosine | batch=128 | farm-3 | 🔄 redeployed |
| R22 | lamb | 5e-4 | cosine | batch=128 | farm-3 | 🔨 building |
| R23 | adam | 3e-4 | cosine | batch=128 | farm-3 | 🔨 building |
| R24 | lamb | 1e-3 | cosine | batch=66 | farm-3 | ⏳ queued |
| R25 | adam | 1e-3 | cosine | batch=128 | farm-3 | 🔨 building |
| R26 | adam | 3e-4 | cosine | ga=2 | farm-3 | ⏳ queued |
| R27 | lamb | 3e-4 | cosine | ga=4 | farm-3 | 🔨 building |
| R28 | adam | 5e-4 | cosine | batch=128 | farm-3 | 🔨 building |
| R29 | adamw | 3e-4 | cosine | WD=0.01 | farm-3 | 🔨 building |
| R30 | lamb | 3e-4 | cosine | warmup=5K | farm-3 | ⏳ queued |
| T1 | adam | 3e-4 | cosine | ctx=27, 30K | farm-3 | ⏳ queued |

### 💰 Cost
~$0.67/run × 26 runs = ~$17.42 (spread across 3 accounts, covered by PRO credits)

---
*Auto-updated by /iterate cycle. Next update in ~15 min.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🧪 Training Farm Wave 4+5 — Evolution Tracker #357

🧠 HSLM Training Farm — Live Status

📊 Farm Overview (2026-03-13)

🔧 Actions Taken This Cycle

📈 Active Training Metrics

🏆 Best Known Result

🎯 Goal: 75/75 slots utilized (3 accounts × 25)

Wave 4+5 Experiment Matrix

💰 Cost

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Account	Slots	Training	Building	Queued	Fixed
Primary	2/25	2 ✅	0	0	-
Farm-2	8/25	0→8 🔧	8 (rebuilding)	0	8 just fixed
Farm-3	16/25	1 (R20)	6	7	2 just fixed
Total	26/75	3	14	7	10 fixed

Service	Account	Step	AvgLoss	PPL	LR	Tok/s
hslm-v11	primary	21K	6.12	~455	1.41e-4	12,113
hslm-train	primary	4.7K	6.18	~483	2.82e-4	12,830
hslm-r20	farm-3	?	?	?	?	?

Run	Optimizer	LR	Schedule	Special	Account	Status
R5	adam	3e-4	cosine	baseline	farm-2	🔄 rebuilding
R6	adamw	1e-3	cosine	—	farm-2	🔄 rebuilding
R10	lamb	3e-4	cosine	ga=2	farm-2	🔄 rebuilding
R11	adam	3e-4	cosine	restarts	farm-2	🔄 rebuilding
R12	adam	3e-4	cosine	ga=4	farm-2	🔄 rebuilding
R13	lamb	1e-3	cosine	ga=4	farm-2	🔄 rebuilding
R14	adam	5e-4	cosine	higher LR	farm-3	🔄 redeployed
R15	adamw	5e-4	cosine	WD=0.05	farm-3	⏳ queued
R16	adam	3e-4	sacred	φ-scale	farm-3	⏳ queued
R17	adam	3e-4	cosine	adaptive-sparsity	farm-3	⏳ queued
R18	adam	3e-4	cosine	ternary-schedule	farm-2	🔄 rebuilding
R19	lamb	3e-3	cosine	ga=8	farm-2	🔄 rebuilding
R20	adam	3e-4	cosine	full-ternary	farm-3	✅ SUCCESS
R21	lamb	3e-4	cosine	batch=128	farm-3	🔄 redeployed
R22	lamb	5e-4	cosine	batch=128	farm-3	🔨 building
R23	adam	3e-4	cosine	batch=128	farm-3	🔨 building
R24	lamb	1e-3	cosine	batch=66	farm-3	⏳ queued
R25	adam	1e-3	cosine	batch=128	farm-3	🔨 building
R26	adam	3e-4	cosine	ga=2	farm-3	⏳ queued
R27	lamb	3e-4	cosine	ga=4	farm-3	🔨 building
R28	adam	5e-4	cosine	batch=128	farm-3	🔨 building
R29	adamw	3e-4	cosine	WD=0.01	farm-3	🔨 building
R30	lamb	3e-4	cosine	warmup=5K	farm-3	⏳ queued
T1	adam	3e-4	cosine	ctx=27, 30K	farm-3	⏳ queued

Uh oh!

🧪 Training Farm Wave 4+5 — Evolution Tracker #357

Description

🧠 HSLM Training Farm — Live Status

📊 Farm Overview (2026-03-13)

🔧 Actions Taken This Cycle

📈 Active Training Metrics

🏆 Best Known Result

🎯 Goal: 75/75 slots utilized (3 accounts × 25)

Wave 4+5 Experiment Matrix

💰 Cost

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions