Releases: superlinked/sie
Releases · superlinked/sie
v0.6.6
v0.6.5
- chore(main): release 0.6.5
- fix(gateway): preserve ref sibling schema semantics
- fix(gateway): dereference structured output schema refs
- refactor(sidecar): inline applied bundle hash
- fix(config): fingerprint model pool ownership
- chore(worker): address SGLang load review nits
- fix(gateway): align provisioning contract docs
- fix(gateway): address provisioning review feedback
- fix(server): guard readiness for removed configs
- test(server): fix ipc registry mock snapshot
- docs: clarify pool capacity semantics
- fix: align GPU memory pressure defaults
- fix(server): honor pool-aware model configs
- fix(gateway): make provisioning non-2xx universally
- fix(worker): keep SGLang loads off event loop
- fix(gateway): return retryable OpenAI provisioning errors
- test(server): cover qwen3 vl reranker document image ordering
- fix(sdk): normalize score images for wire transport
- fix(server): render qwen3 vl reranker document images in user prompt
- fix(gateway): decode native media JSON bytes
- fix(deps): bump sidecar prometheus for protobuf advisory
- fix(config): harden replace snapshot IPC
- fix(config): replace drifted export snapshots
- fix(config): detect bundle config hash drift
- fix(sie-cluster): add spot toleration to AKS worker pool
v0.6.4
- chore(main): release 0.6.4
- fix(sidecar): preserve msgpack work item payloads
- fix(gateway): resubscribe stale NATS health stream
- feat(helm): ship values-aks.yaml AKS overlay with the Azure module
- fix(helm): render selectorLabels after podLabels in workload templates
- feat(helm): expose chart-level podLabels for Workload Identity wiring
- feat(helm): accept abfs:// and abfss:// payloadStore + clusterCache URLs
v0.6.3
- chore(main): release 0.6.3
- test: stabilize SDK timeout retry assertion
- fix: fall back to relay on S3/GCS server-side copy failure
- feat: add server-side copy fast path for cloud weight sync
- fix: address final cloud storage review issues
- fix: harden cloud cache sync paths
- fix: address azure cache review feedback
- feat: support azure blob cluster cache
- feat: add azure blob payload store support
- fix(generate): reject both-present image-bearing content layouts
- fix(generate): image-free content_parts field must not shadow layout
- feat(generate): preserve text/image content-part ordering
- fix(nemo_colembed): trim left-padding rows from v1 conformant doc embeddings (#1163)
- perf(nemo_colembed): engage conformant image preprocessing for v1 (#1163)
- fix: evict stale gateway workers on shutdown
- fix(generate): address huronat review on vision input (F2-F8)
- fix(generate): address CodeRabbit review on vision input
- feat(generate): add vision (image) input to generate()
- fix(deps): clear HIGH Dependabot alerts (docling, rustls-webpki)
- fix(bench): address CodeRabbit — accurate classification comment + quote-agnostic test
- fix: align extraction quality evals with source baselines
- feat: add m4 extraction model configs
v0.6.2
- chore(main): release 0.6.2
- fix(server): consolidate runtime ninja install
- fix(server): install ninja in cuda runtime
- fix(helm): scale single-profile bundles on gpu-agnostic demand
- fix(dev): address KEDA Tilt PR review
- feat(dev): refresh KEDA Tilt local dev branch
- fix: accept dense dim in qwen3 vl embedding adapter
- feat(models): add M4 dense encoders mxbai-embed-large-v1, arctic-embed-l-v2.0, modernbert-embed-base
v0.6.1
v0.6.0
- chore(main): release 0.6.0
- feat(gateway): route work by queue pool lanes
- feat(helm): default SIE_POOL to pool name (not "default")
- fix(deps): bump vitest 2.1.9 -> 4.1.0 (CVE-2026-47429)
- fix(gateway): harden queue lane admission
- fix(helm): align lane defaults and tilt e2e
- fix(helm): preserve worker-group queue defaults
v0.5.0
- chore(main): release 0.5.0
- docs(helm): clarify README pool example is illustrative, not tester-specific
- fix(helm): fail-fast on missing/invalid bundle replica bounds
- chore(helm): address PR #1205 review feedback
- fix(helm): preserve gateway metrics scrape labels
- feat(helm)!: split worker pools into pool × bundles schema
- fix(gateway): expose unauthenticated metrics scrape port
- chore(openapi): regenerate spec for code/sql/guard capability fields
- fix(models): drop unsupported ebnf advertisement + restore guardian a100 guard threshold
- feat(models): surface code/sql/guard capabilities; resolve job aliases in configs/resolve
- fix(guard): robust verdict thresholding, logprob hygiene, decoded-token logprobs
- feat(tester-cluster): add sglang worker pool for generative models
- fix(sie_server): honor params.instruction in Florence-2 extract (#1053)
- fix(helm): use sidecar binary for image pre-pull
- chore: remove agent-jobs runbook, ADR 0001, and m5-planning docs
- fix(guard): reject multi-candidate sampling + keep logprobs consistent on rewrite
- feat(guard): P(unsafe) logprob threshold for CHECK POLICY precision (#1187)
- docs(ops): agent-jobs prod-readiness runbook + opt-in model-alias deploy config
- fix(review): address #1184 review comments + restore dropped A/B fix
- bench(sql): grammar A/B + SQLCoder native-template measured on Spider
- test(gateway): end-to-end precision routing through resolve_model_and_bundle
- test(gateway): prove a model routes across two precision bundles
- feat(gateway): job aliases can carry a precision bundle (SQL->BF16 routing)
- docs(27b): flag FP8 SQL regression on sql cap + clarify targets are documentary
- feat(27b): measure Qwen3.6-27B on code/SQL/tools; advertise code+sql
- feat(guard): CHECK POLICY content-moderation model + ToxicChat F1 eval
- bench(sql): measured Spider execution accuracy + anchored floor; SQLCoder serve-validated
- feat(code): point model="code" at the measured model; xgrammar-validate SQL grammar
- feat(sql): onboard SQLCoder (Defog) config + starter SQL grammar artifact
- feat(bench): add Spider text-to-SQL execution-accuracy eval + model="sql" alias
- fix(review): address PR feedback on the code-eval
- feat(server,gateway): advertise code capability + model="code" alias
v0.4.2
- chore(main): release 0.4.2
- Scope SGLang CUDA toolkit runtime
- Enable CUDA toolkit in SGLang worker runtime
- perf(mineru_vl): O(L) incremental no-repeat-ngram for greedy decode
- feat(sie_server): add MinerU2.5-Pro-2604-1.2B doc OCR adapter
- build(deps): upgrade rust toolchain to 1.96
- feat(models): add Marqo/marqo-fashionSigLIP (SigLIP open_clip, fashion image-text)
- fix(ci): keep sidecar out of warm cache
- fix(review): 0.6B ctx test 1024->4096, loader except logs, README gaps resolved
- fix(ci): avoid nested mise in integration fixture
- docs: align sidecar naming in active docs
- fix(deploy): align server sidecar naming
- fix(deploy): document worker-sidecar metrics wiring
- fix(deploy): rename sidecar container to worker-sidecar
- fix(deploy): align server sidecar naming and kind preload smoke
- fix(deploy): publish server sidecar image
- feat(model): bump Qwen3-0.6B serving context 1024→4096 for prod simple-task use
- fix(bench): let via-SIE smoke serve a profile-variant model end-to-end
- fix(loader): wire profile runtime.default_sampling into the adapter
- feat(model+bench): RTX-PRO-6000 FP8 profile for Qwen3.6-27B + 6000 validation
- refactor(glm_ocr): select patch-embed conv strictly by structure (CodeRabbit)
- perf(glm_ocr): rebind vision Conv3d patch-embed to F.linear
- chore(adapters,ci): remove Vidore3 throughput diagnosis instrumentation
- fix(test): restore donut helper call contract
- fix(ci): address analyzer findings and stale queue test
- fix(adapters): replace Qwen3-VL vision Conv3d patch-embed with matmul
- chore(helm): clean sidecar chart observability
- feat(sidecar): add worker config and pool admission reconciliation
- TEMP(colqwen3): time vision sub-modules (patch_embed/block/merger)
- test(tilt): expand sidecar e2e coverage
- feat(sidecar): wire generation direct dispatch
- TEMP(colqwen3): split forward timing into vision vs text (revert before merge)
- fix(adapters): route Qwen3-VL VLMs through flash attention (Vidore3 throughput)
- TEMP(adapters,ci): timing instrumentation for Vidore3 throughput diag
- chore(openapi): regenerate gateway spec for min_tokens + chat_template_kwargs
- fix: address coderabbit + code-quality review on PR #1146
- feat(bench+model): via-sie 4-task n=300 sweep + NEXTN smaller-draft on 27B
- fix(worker): SGLang adapter accepts min_new_tokens kwarg + 27B via-sie validated
- fix: slow sidecar nats consumer reconcile
- fix: gate sidecar nats reconnect refresh
- fix: address pr review quality issues
- fix: harden sidecar config recovery
- chore: clean sidecar docs and packaging
- chore: prune sidecar compatibility paths
- docs(sidecar): clarify worker sidecar source naming
- fix(ci): refresh gateway openapi contract
- fix(quality): repair adapter eval harness regressions
- fix(worker-sidecar): harden queue carveout contracts
- fix: scope bundle config hash cache per registry
- chore: standardize worker sidecar packaging
- feat: reconcile live worker config in sidecar
- Add worker config reconciliation
- chore: clean worker sidecar deployment surfaces
- chore: enable rust sidecar in local tilt
- fix: require rust sidecar for queue workers
- docs: clarify worker config apply gap
- chore: consolidate inference sidecar package
- fix: preserve worker batch identity and publish image
- chore: finish rust worker rebase integration
- chore(helm): land S20+ rust-sidecar production tuning defaults
- chore: scrub stale sie_candle references and dead candle metric
- non-adapter carveout: ship sie_prep + wire passthrough end-to-end
- non-adapter carveout: retire sie_candle, carve out sie_prep, Python passthrough
- sie_worker_rust + sie_server: queue-depth metrics for IPC+Python loop
- worker-rust + sie_candle: NATS health heartbeats and BERT cross-encoder
- rust-worker: retire SIE_RUST_*_MODELS env vars, carve sie_candle out into its own crate (Stage 3 P1)
- rust-worker: land Stage 1 (tokenise + framing) + Stage 2 (scheduler) + Stage 3 design
- adding perf-tuning grafana dash fixes and extension
- obs(helm): perf-tuning Grafana dashboard + ConfigMap
- perf(rope_flash): vectorize CLS/mean pooling, eliminate per-item .item() sync
- perf(adaptive): anchor min_batch_cost floor at max_batch_tokens // 4
- revert: restore adaptive batching defaults to 15/50ms
- perf(batching): tighten adaptive wait ceiling + revert gte-multilingual 32k
- perf(gte-multilingual-base): raise max_batch_tokens 16k → 32k to stop IPC-batch shred
- perf(server): FP16 on GPU, coalesce sized for IPC bursts, starvation self-heal
- obs(worker+server): audit follow-ups for phase + fragmentation metrics
- obs(worker+server): surface GPU phase latency + IPC-batch fragmentation
- feat(worker/rust): IPC connection pool — lift the sidecar's last serialization bottleneck
- fix(gateway+server): queue is the only mode — kill direct-mode cruft
- fix(gateway): suppress H9 first-chunk-fallback on single-worker pools
- worker(rust)+sie_server: post-audit P0 fixes — drain min-deadline, fallback eviction tests, dispatcher outcome binding, encoder UnsupportedModel coverage, model-label cardinality tests
- worker(rust): finalise pre-Argo audit — native Candle, fallback breaker, full observability, Docker + Helm
- chore(sie-server): drop dead code left over from sidecar cutover
- fix(worker): harden payload store + error paths; surface silent success bugs
- sie-server: commit to sidecar-only queue path; remove Python NATS
- feat(sie_worker_rust): close parity gaps with Python pull loop + smoke test
- feat(sie_server): UDS msgpack IPC server for Rust worker sidecar
- feat(sie_server): carve out QueueExecutor + IPC types for Rust worker POC
- feat(bench): 0.6B via-sie validated; harness + 27B config gains
- fix(model): bump Qwen3.6-27B default/h100 mem_fraction_static 0.85 → 0.92
- feat(gateway+worker): chat surface accepts min_tokens + chat_template_kwargs
- fix: accept dense_dim in dense adapters
- test: align structured output metric constants
- fix(sie_server): clear CUDA cache on uncovered VLM paths + drop private sem _value access
- chore: remove internal design doc references
- docs: clean stale design references
- docs: minimize design architecture docs
- docs: update design document references
- docs: remove internal planning references
- docs: archive obsolete roadmap
- fix(security): cap vite at ^6 + add Node engines to website
- feat(gateway): strengthen generation isolation guardrails
- feat(docling): accept image input + run on OCR-bench quality path
- fix(security): bump sie_ts_sdk standalone pnpm transitives
- fix(security): bump root pnpm deps + add overrides for transitives
- fix(security): bump root Python deps to patched versions
- fix(security): bump gateway deps to patched versions
- chore: drop CodeQL rationale comments
- fix(quality): batch3 of CodeQL findings + bench KIE bug
- fix(quality): drop redundant inline imports in donut + registry
- fix(security): use Reflect.construct for WebSocket headers shim
- fix(quality): close CodeQL quality-tab findings
- Fix gateway queue trace isolation
- perf(bench,adapters): parallel VLM sub-batch dispatch + drop redundant empty_cache
- Keep generation machinery off default queue path
- docs(test): note
:->__replacement in test_model_yaml_filenames docstring - fix(models): rename ColQwen3 YAML to match sie_id casing (TomoroAI)
v0.4.1
- chore(main): release 0.4.1
- fix(security): resolve 18 open CodeQL alerts
- Revert "Fix pool queue batching coalescing"
- Fix pool queue batching coalescing
- refactor(release-docker): remove sie-deps prebake; build deps in-band
- fix: refresh generation pool fallback on hot add
- fix: isolate generation direct dispatch
- feat(server): add Qwen3.6-27B model + migrate to CUDA 12.9