Skip to content

🧪 Training Farm Wave 4+5 — Evolution Tracker #357

@gHashTag

Description

@gHashTag

🧠 HSLM Training Farm — Live Status

📊 Farm Overview (2026-03-13)

Account Slots Training Building Queued Crashed Fixed
Primary 2/25 2 ✅ 0 0 0 -
Farm-2 8/25 0→8 🔧 8 (rebuilding) 0 0 8 just fixed
Farm-3 16/25 1 (R20) 6 7 0 2 just fixed
Total 26/75 3 14 7 0 10 fixed

🔧 Actions Taken This Cycle

  1. ✅ Fixed startCommand=null on ALL services (was referencing deleted entrypoint-train.sh)
  2. ✅ Set HSLM_* env vars on all 8 farm-2 services
  3. ✅ Redeployed all 10 crashed services (8 farm-2 + 2 farm-3)
  4. ✅ Fixed primary services startCommand for future redeploys

📈 Active Training Metrics

Service Account Step AvgLoss PPL LR Tok/s
hslm-v11 primary 21K 6.12 ~455 1.41e-4 12,113
hslm-train primary 4.7K 6.18 ~483 2.82e-4 12,830
hslm-r20 farm-3 ? ? ? ? ?

🏆 Best Known Result

v4R: PPL=125 (Loss=4.83) — Adam, LR=3e-4, cosine, 100K steps

🎯 Goal: 75/75 slots utilized (3 accounts × 25)

Wave 4+5 Experiment Matrix

Run Optimizer LR Schedule Special Account Status
R5 adam 3e-4 cosine baseline farm-2 🔄 rebuilding
R6 adamw 1e-3 cosine farm-2 🔄 rebuilding
R10 lamb 3e-4 cosine ga=2 farm-2 🔄 rebuilding
R11 adam 3e-4 cosine restarts farm-2 🔄 rebuilding
R12 adam 3e-4 cosine ga=4 farm-2 🔄 rebuilding
R13 lamb 1e-3 cosine ga=4 farm-2 🔄 rebuilding
R14 adam 5e-4 cosine higher LR farm-3 🔄 redeployed
R15 adamw 5e-4 cosine WD=0.05 farm-3 ⏳ queued
R16 adam 3e-4 sacred φ-scale farm-3 ⏳ queued
R17 adam 3e-4 cosine adaptive-sparsity farm-3 ⏳ queued
R18 adam 3e-4 cosine ternary-schedule farm-2 🔄 rebuilding
R19 lamb 3e-3 cosine ga=8 farm-2 🔄 rebuilding
R20 adam 3e-4 cosine full-ternary farm-3 ✅ SUCCESS
R21 lamb 3e-4 cosine batch=128 farm-3 🔄 redeployed
R22 lamb 5e-4 cosine batch=128 farm-3 🔨 building
R23 adam 3e-4 cosine batch=128 farm-3 🔨 building
R24 lamb 1e-3 cosine batch=66 farm-3 ⏳ queued
R25 adam 1e-3 cosine batch=128 farm-3 🔨 building
R26 adam 3e-4 cosine ga=2 farm-3 ⏳ queued
R27 lamb 3e-4 cosine ga=4 farm-3 🔨 building
R28 adam 5e-4 cosine batch=128 farm-3 🔨 building
R29 adamw 3e-4 cosine WD=0.01 farm-3 🔨 building
R30 lamb 3e-4 cosine warmup=5K farm-3 ⏳ queued
T1 adam 3e-4 cosine ctx=27, 30K farm-3 ⏳ queued

💰 Cost

~$0.67/run × 26 runs = ~$17.42 (spread across 3 accounts, covered by PRO credits)


Auto-updated by /iterate cycle. Next update in ~15 min.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions