Skip to content

Hive teacher backfill, Perceptron integration, sorter incident hardening, model publish polish#136

Merged
spencerhhubert merged 22 commits into
mainfrom
sorthive
May 21, 2026
Merged

Hive teacher backfill, Perceptron integration, sorter incident hardening, model publish polish#136
spencerhhubert merged 22 commits into
mainfrom
sorthive

Conversation

@mneuhaus

Copy link
Copy Markdown
Collaborator

Summary

16 commits accumulated on sorthive since #129 merged. Most of the surface area
is the new Hive teacher backfill stack (queueable Gemini/Perceptron
re-detection on stored samples) plus a handful of Hive UX + sorter reliability
fixes.

Hive

Teacher backfill (the big one)

  • Admin-only flow to re-run vision models on existing samples. Adapter pattern
    with isolated request/response handling per provider — adding a model = one
    registry entry.
  • Adapters: OpenRouterChat (Gemini 3/3.1/3.5, Qwen 3.6, Kimi, MiMo,
    Nemotron), Grok (XYXY bbox parse), Perceptron Mk1 (calls
    api.perceptron.inc/v1/chat/completions directly — OpenRouter's shim
    couldn't reliably trigger grounded mode).
  • Worker: ThreadPoolExecutor with per-adapter max_concurrent +
    min_interval_s caps, 429 → Retry-After + jitter backoff, atomic item claim
    via SELECT … FOR UPDATE SKIP LOCKED. Sessions held only for the short
    load + write transactions so the connection pool doesn't get monopolized by
    multi-second adapter calls.
  • Per-user secrets: openrouter_api_key (existing) + new
    perceptron_api_key, both encrypted. secret_kind on each adapter routes
    to the right key.
  • Endpoints: POST/GET/cancel jobs, sync single-sample rerun, non-destructive
    preview (for compare page), model registry, default prompt fetch.
  • Migrations 0xa9–0xd2: teacher_jobs + teacher_job_items + cost columns +
    user perceptron_api_key + preferred_teacher_model.

Teacher UI

  • /samples admin "Re-run teacher" button + live-polling status banner that
    auto-restores on reload by fetching the latest in-flight job.
  • /samples/[id] header gets "Compare models" + "Re-run teacher" (real Button
    primitives, not text links).
  • /admin/teacher-jobs splits Active (big cards with cost + tokens) from
    History (compact rows). Detail view paginates items 50/page server-side with
    status filter chips (queued / running / done / error / skipped).
  • /samples/[id]/compare runs every supported model on the same sample with
    per-tile image + bboxes, metrics, cost, latency, raw-response inspector, and
    an editable prompt textarea.
  • /settings admin section adds Perceptron API Key + Default Teacher Model
    selectors (separate from the AI Assistant chat model).

Samples list

  • Public-default visibility: any signed-in user reads all samples; writes still
    owner-or-reviewer/admin gated. ?scope=mine opts back into the private view.
  • Sample-detail sidebar shows owning machine + owner avatar/display name, links
    to that machine's sample list.
  • Source labels renamed (c_channel_2 → C2, classification_channel → C-Channel
    4 (Classification)), Capture Reason filter dropped, new Age filter
    (24h/7d/30d/all) wired through to max_age_hours.
  • Diversity page gains a machine_factor: coverage × min(machines, 3)/3 so a
    single-rig reason can't read as "done".

Model publishing + browsing

  • Compose Hive model metadata on publish and rework detail page.
  • Self-describing filename in variant download dropdown.
  • Variant download fixes.
  • Machines can see their owner's private models.
  • "Show source Hive" + link model titles to source Hive detail page.
  • Drop Hive selector in Browse Hive; aggregate across all targets.

Sorter

  • Handle channel-stuck incidents and C4 exit deadlocks (incident state machine
    • media pipeline improvements).
  • Approve-passthrough when clearing no-bin incident.

Training

  • Thread machine_id end-to-end through pull → build with --balance-machine
    CLI flag, so dataset composition can balance across rigs (and report
    per-machine counts in build.json regardless).
  • RUNBOOK.md captures the full pull → train → publish recipe.

Test plan

  • Backend: pytest software/hive/backend/tests — 112/112 green locally
  • Frontend: pnpm --dir software/hive/frontend check — 0 errors
  • Already deployed to hive.basically.website (sorthive branch) — backend
    + worker stable under 4k-item parallel backfill after the connection-pool
    restructure
  • Manually verified: re-run teacher on a single sample, batch backfill with
    live status banner, compare page across all 10 registered models, jobs
    pagination + status filter

mneuhaus added 16 commits May 16, 2026 23:27
- train publish --dataset-dir auto-fills the structured training_metadata
  the Hive frontend expects (model.best_metrics, dataset.selection,
  benchmarks, variant_sizes_bytes) from build.json + track_*_results.json
- Add train compose-metadata as a standalone preview command
- train build --max-empty-fraction caps empties as a share of the final
  dataset so --keep-empty doesn't pull in all negatives
- ModelTrainingReport adapts to available data: hide HOLDOUT F1 / Decision
  Match / audit / precheck / count-spectrum sections when those fields are
  missing, and add an Inference Performance section + training-setup chips
serve_model_variant set Content-Length=<file_size> on the response, which
is wrong when the storage backend returns a 307 redirect to a presigned
S3 URL: the redirect body is empty and starlette aborts the connection
with 'Response content shorter than Content-Length'. Move the file size
out to an X-Model-Size header and let the framework set Content-Length.

Also drop the download={file_name} attribute on the variant link so the
server's Content-Disposition (built by build_download_filename, format
{slug}_v{version}_{date}_{runtime}.ext) is what the browser saves as.
Mirror build_download_filename in the SvelteKit page so the dropdown
shows the same {slug}_v{version}_{date}_{runtime}{ext} the browser will
save, instead of the raw on-disk name like best.onnx. Also fix the
backend to preserve .tar.gz as a compound suffix (Path.suffix alone
returned just .gz for ncnn bundles).
- /api/machine/models filtered out is_public=False, so a freshly published
  private model couldn't appear in the Sorter's 'Browse Hive' tab. Widen
  the filter to also include models owned by the same account the machine
  is registered under (public for everyone else, private for the owner).
- Apply the same filter to the per-model GET and variant download routes
  so the detail/download paths stay consistent with the list.
- Sorter UI: relabel the Browse Hive selector from 'Target / <machine
  name>' to 'Hive / <url>'. The dropdown identifies which Hive instance
  we're pulling from, not the machine name we registered as.
The Sorter previously forced you to pick one configured Hive at a time
and then browsed that one in isolation. Merge the view: when no
target_id is supplied the backend now fans out across every enabled
target, tags each item with its source (target_id / target_url /
target_name), and returns one combined list sorted by published_at.

Per-target failures land in a non-fatal `errors` array so one
unreachable Hive doesn't blank the catalog. Each model row in the UI
now shows the source host (e.g. `hive.basically.website`) instead of
hiding it behind a selector, and the per-row Download / detail calls
use the model's own target_id rather than a global selectedTargetId.
Mirror the Browse Hive view: each installed row now carries its source
Hive's host inline in the meta line (e.g. `hive.basically.website`,
or `bundled` for repo-shipped models), and the expanded details panel
labels the value 'Hive' with the full URL instead of the old machine
name. Same helper renders both spots so they stay in sync.
In both Browse Hive and Installed, the model name is now a link to
<target_url>/models/<model_id> on the source Hive (new tab,
rel=noopener). Bundled models keep the plain text label since they
don't live on any Hive.
Documents the concrete workflow we ran for the 2026-05-17
c-channel-combined yolo26s-320 model so the next training run can be
reproduced from a single document instead of reconstructing it from the
README's high-level diagram. Includes the new --max-empty-fraction,
--dataset-dir auto-compose, --benchmark-json, and Vast.ai destroy step,
plus the prod deploy command and a note on the still-TBD Hailo preset.
- Reads (list/detail/diversity/assets) drop the owner-restricted query —
  any signed-in member can browse all samples; writes still gated to owner
  or reviewer/admin. Add ?scope=mine opt-in for the legacy view.
- Samples list URL gains scope + max_age_hours filter keys via
  sampleListContext.
- Sample detail sidebar shows machine name + owner avatar/display name
  and links to that machine's sample list across users.
- Machine list endpoint exposes nested owner block (auto-coerced from
  the SQLAlchemy relationship via a pydantic before-validator).
- Diversity overview gains a machine_factor: coverage is multiplied by
  min(distinct_machines, 3) / 3 so a reason captured from one rig only
  can't read as "done". Trends + ETA fold the same factor in.
- test_samples.py updated to assert the new public-default behaviour
  and the per-scope filter.
- train pull writes machine_id into each manifest entry (taken from the
  Hive sample detail response).
- _LabeledSample carries machine_id; build.py records per-machine counts
  in build.json regardless of flags so the diversity audit always shows
  how skewed the selection is.
- New --balance-machine flag adds machine to the equal-quota balance
  group key alongside source_role and piece_count, so one rig can't
  dominate the FPS-sampled selection.
…mples

Admin-only flow to re-detect bounding boxes on existing samples through a
pluggable provider adapter. Built around a typed adapter Protocol so each
model gets isolated request/response handling instead of one detector
branching on model id.

Backend:
- TeacherJob + TeacherJobItem models with status_counts aggregation, cost
  tracking (real usage from provider + projected total), and a parallel
  worker that claims items via SELECT...FOR UPDATE SKIP LOCKED.
- Worker uses a ThreadPoolExecutor (TEACHER_WORKER_PARALLELISM=6) with
  per-adapter max_concurrent + min_interval_s caps so a single provider
  can't monopolize the pool. 429 responses raise TeacherRateLimitError
  with Retry-After parsed, triggering exponential-backoff retry +
  jitter.
- Adapter registry: OpenRouterChatAdapter (Gemini 3/3.1/3.5, Qwen 3.6,
  Kimi, MiMo, Nemotron), GrokAdapter (overrides bbox parse to XYXY
  instead of YXYX), PerceptronAdapter (calls Perceptron's native API
  directly at api.perceptron.inc/v1/chat/completions instead of via
  OpenRouter, which couldn't reliably trigger grounded mode).
- Per-user secrets via secret_kind: openrouter_api_key and
  perceptron_api_key both encrypted; resolver picks the right one per
  adapter.
- preferred_teacher_model on the user separates teacher fallback from
  the AI chat assistant's preferred_ai_model.
- Endpoints: POST jobs (filter-driven), GET jobs (list + detail),
  POST cancel, POST samples/{id}/rerun (sync single-sample bypassing the
  worker), POST samples/{id}/preview (non-destructive for the compare
  page), GET samples/{id}/prompt, GET models (registry).
- Migrations 0xa9-0xd2 add teacher_jobs + teacher_job_items + cost
  columns + the two new user secret/preference columns.

Frontend:
- /samples gains an admin "Re-run teacher" button + Jobs link + a
  live-polling status banner that auto-restores on page reload by
  fetching the latest in-flight job.
- /samples/[id] gains "Compare models" + "Re-run teacher" buttons in
  the header toolbar (real Button primitives, not text links). Sidebar
  Machine row links to the owning machine's sample list.
- /admin/teacher-jobs splits Active (big cards with cost + tokens) from
  History (compact rows). Each job links to a detail view with the
  remaining items section, status counts, and a finished-items tail.
- /samples/[id]/compare runs every supported model on the same sample
  side-by-side. Per-tile image + bboxes, per-tile metrics (boxes, score,
  cost, latency), "Show raw response" expander for debugging coord
  formats, and an editable prompt textarea that overrides the default
  for chat-style adapters (Perceptron ignores override since its native
  short instruction is what triggers grounded XML output).
- /settings gains a Perceptron API Key card + a Default Teacher Model
  select populated from /api/admin/teacher/models.
- Samples list filters: Capture Reason dropped (rarely useful), source
  labels renamed (c_channel_2 → C2, classification_channel → C-Channel
  4 (Classification)), new Age filter (24h / 7d / 30d / All) wired
  through to the backend max_age_hours param.
- New warning-strong + warning-bg color tokens fix the unreadable yellow
  text in error banners.

Sorter:
- gemini_sam_detector.py gets the same compact classification_channel
  prompt as the Hive copy (lock-step per project_teacher_zones).
The new parallel worker held one SQLAlchemy session per in-flight item for
the entire adapter call — including the multi-second Perceptron/Gemini
round-trip plus 429 backoff sleeps. With 6 workers each pinning a
connection for 5-30s the default pool (5+10) timed out, taking the whole
backend with it ("QueuePool limit of size 5 overflow 10 reached").

Restructure _run_item into three short-lived transactions:
1. _load_item_context: open session, fetch item+job+sample+owner, decrypt
   the API key, read the image bytes, close session
2. (no session) throttle + adapter.detect() with backoff retry
3. _write_result: reopen session, re-fetch by id, apply mutations, commit

Plus bump the engine pool to 20+20 with pool_recycle=1800 so a noisy
backfill can't elbow user-facing API requests out of the way.
At 4k+ items the single-page item dump made the detail view unusable.
Backend gains items_status, items_page, items_page_size query params on
GET /api/admin/teacher/jobs/{id}. status_counts still scans the whole
job so the header badges stay accurate; the items list is paginated
50/page server-side with smart ordering (queued/running oldest-first,
finished states newest-first).

Frontend replaces the old Remaining + Recently-finished sections with a
single Items section: a filter chip strip (all / queued / running /
done / error / skipped, each with its live count) drives the query;
classic prev/next pagination with first/last/window page links. Filter
+ page are component-local state, not URL-persistent — a refresh always
lands you on page 1 of the default view.
@vercel

vercel Bot commented May 20, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
sorter-v2-docs Ready Ready Preview, Comment May 21, 2026 12:46am
sorteros-setup Ready Ready Preview, Comment May 21, 2026 12:46am

Browsing/filtering by individual machines exposes other users' rig names
and owner display names once samples are public-default. Gate the sidebar
section behind isAdmin until we've thought through what the per-user
view should look like. The detail-page Machine link + URL ?machine_id=
filter still work for anyone who already has the id.
"Review Samples" on /samples now forwards the active sidebar filter
(scope, machine_id, source_role, capture_reason, max_age_hours) into the
URL, and /review reads them back on every loadNext call so the reviewer
only sees samples from the slice they picked — same affordance the
Re-run teacher button already had.

- review.py /queue/next gains the same five filter params with identical
  semantics to /api/samples.
- /samples Review button label becomes "Review filtered" when any filter
  is active, with a tooltip explaining the scope.
- /review header shows the active filter chips with a Clear link that
  drops back to the unfiltered queue.

Plain /review (no query string) keeps the original behaviour: any
unreviewed/in-review sample, oldest-first.
Reviewers can already filter the queue by sidebar selection; admins now
get a small "Re-run teacher" card next to the action pad on /review with
a model dropdown + Run button. Clicking it calls the same sync
single-sample endpoint /samples/[id] already uses, swaps the sample
state with the fresh detection, and clears the local review history so
the action pad shows an unreviewed slate (the backend already resets
review_status). Compare → link drops to /samples/[id]/compare for the
full side-by-side view.

Defaults to the admin's preferred_teacher_model when set, otherwise the
first registered adapter. Members see no change.
The dropdown now resolves in order: localStorage > user.preferred_teacher_model
> first registered. Saved whenever the value changes after initial prefill,
so the reviewer's last-used model sticks across reloads and tabs without
needing to touch the global preference in Settings.
The annotator action buttons wired event handlers as bare property
references (`onclick={annotatorApi.save}`). The AnnotatorApi class
fields aren't $state, so when SampleAnnotator's $effect remounts a new
sample and reassigns `externalApi.save = saveAnnotations`, the buttons
keep firing the previous closure (or the default no-op stub) because
Svelte captured the reference at render time.

Affected the Review queue most visibly — every accept/reject re-fetches
a new sample so the annotator remounts every turn. Sample-detail page
worked by accident because it usually stays on one sample.

Indirect through arrow functions (`onclick={() => annotatorApi.save()}`)
so the binding resolves at click time. Applied to all action buttons on
both panels for consistency.
@spencerhhubert spencerhhubert merged commit edeb9a4 into main May 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants