Skip to content

Releases: superlinked/sie

v0.6.6

14 Jun 14:30

Choose a tag to compare

  • chore(main): release 0.6.6
  • fix(config): clarify missing profile inheritance
  • fix(config): avoid sticky missing bundle hashes
  • fix(config): fail closed on missing bundle metadata
  • fix(config): align pool-scoped bundle hashes

v0.6.5

13 Jun 22:04

Choose a tag to compare

  • chore(main): release 0.6.5
  • fix(gateway): preserve ref sibling schema semantics
  • fix(gateway): dereference structured output schema refs
  • refactor(sidecar): inline applied bundle hash
  • fix(config): fingerprint model pool ownership
  • chore(worker): address SGLang load review nits
  • fix(gateway): align provisioning contract docs
  • fix(gateway): address provisioning review feedback
  • fix(server): guard readiness for removed configs
  • test(server): fix ipc registry mock snapshot
  • docs: clarify pool capacity semantics
  • fix: align GPU memory pressure defaults
  • fix(server): honor pool-aware model configs
  • fix(gateway): make provisioning non-2xx universally
  • fix(worker): keep SGLang loads off event loop
  • fix(gateway): return retryable OpenAI provisioning errors
  • test(server): cover qwen3 vl reranker document image ordering
  • fix(sdk): normalize score images for wire transport
  • fix(server): render qwen3 vl reranker document images in user prompt
  • fix(gateway): decode native media JSON bytes
  • fix(deps): bump sidecar prometheus for protobuf advisory
  • fix(config): harden replace snapshot IPC
  • fix(config): replace drifted export snapshots
  • fix(config): detect bundle config hash drift
  • fix(sie-cluster): add spot toleration to AKS worker pool

v0.6.4

11 Jun 23:59

Choose a tag to compare

  • chore(main): release 0.6.4
  • fix(sidecar): preserve msgpack work item payloads
  • fix(gateway): resubscribe stale NATS health stream
  • feat(helm): ship values-aks.yaml AKS overlay with the Azure module
  • fix(helm): render selectorLabels after podLabels in workload templates
  • feat(helm): expose chart-level podLabels for Workload Identity wiring
  • feat(helm): accept abfs:// and abfss:// payloadStore + clusterCache URLs

v0.6.3

10 Jun 20:41

Choose a tag to compare

  • chore(main): release 0.6.3
  • test: stabilize SDK timeout retry assertion
  • fix: fall back to relay on S3/GCS server-side copy failure
  • feat: add server-side copy fast path for cloud weight sync
  • fix: address final cloud storage review issues
  • fix: harden cloud cache sync paths
  • fix: address azure cache review feedback
  • feat: support azure blob cluster cache
  • feat: add azure blob payload store support
  • fix(generate): reject both-present image-bearing content layouts
  • fix(generate): image-free content_parts field must not shadow layout
  • feat(generate): preserve text/image content-part ordering
  • fix(nemo_colembed): trim left-padding rows from v1 conformant doc embeddings (#1163)
  • perf(nemo_colembed): engage conformant image preprocessing for v1 (#1163)
  • fix: evict stale gateway workers on shutdown
  • fix(generate): address huronat review on vision input (F2-F8)
  • fix(generate): address CodeRabbit review on vision input
  • feat(generate): add vision (image) input to generate()
  • fix(deps): clear HIGH Dependabot alerts (docling, rustls-webpki)
  • fix(bench): address CodeRabbit — accurate classification comment + quote-agnostic test
  • fix: align extraction quality evals with source baselines
  • feat: add m4 extraction model configs

v0.6.2

08 Jun 18:32

Choose a tag to compare

  • chore(main): release 0.6.2
  • fix(server): consolidate runtime ninja install
  • fix(server): install ninja in cuda runtime
  • fix(helm): scale single-profile bundles on gpu-agnostic demand
  • fix(dev): address KEDA Tilt PR review
  • feat(dev): refresh KEDA Tilt local dev branch
  • fix: accept dense dim in qwen3 vl embedding adapter
  • feat(models): add M4 dense encoders mxbai-embed-large-v1, arctic-embed-l-v2.0, modernbert-embed-base

v0.6.1

07 Jun 12:40

Choose a tag to compare

  • chore(main): release 0.6.1
  • fix(gateway): fail fast on invalid static pool config
  • fix(gateway): canonicalize static queue pool names
  • feat(gateway): support static queue pools

v0.6.0

09 Jun 09:27

Choose a tag to compare

  • chore(main): release 0.6.0
  • feat(gateway): route work by queue pool lanes
  • feat(helm): default SIE_POOL to pool name (not "default")
  • fix(deps): bump vitest 2.1.9 -> 4.1.0 (CVE-2026-47429)
  • fix(gateway): harden queue lane admission
  • fix(helm): align lane defaults and tilt e2e
  • fix(helm): preserve worker-group queue defaults

v0.5.0

05 Jun 07:32

Choose a tag to compare

  • chore(main): release 0.5.0
  • docs(helm): clarify README pool example is illustrative, not tester-specific
  • fix(helm): fail-fast on missing/invalid bundle replica bounds
  • chore(helm): address PR #1205 review feedback
  • fix(helm): preserve gateway metrics scrape labels
  • feat(helm)!: split worker pools into pool × bundles schema
  • fix(gateway): expose unauthenticated metrics scrape port
  • chore(openapi): regenerate spec for code/sql/guard capability fields
  • fix(models): drop unsupported ebnf advertisement + restore guardian a100 guard threshold
  • feat(models): surface code/sql/guard capabilities; resolve job aliases in configs/resolve
  • fix(guard): robust verdict thresholding, logprob hygiene, decoded-token logprobs
  • feat(tester-cluster): add sglang worker pool for generative models
  • fix(sie_server): honor params.instruction in Florence-2 extract (#1053)
  • fix(helm): use sidecar binary for image pre-pull
  • chore: remove agent-jobs runbook, ADR 0001, and m5-planning docs
  • fix(guard): reject multi-candidate sampling + keep logprobs consistent on rewrite
  • feat(guard): P(unsafe) logprob threshold for CHECK POLICY precision (#1187)
  • docs(ops): agent-jobs prod-readiness runbook + opt-in model-alias deploy config
  • fix(review): address #1184 review comments + restore dropped A/B fix
  • bench(sql): grammar A/B + SQLCoder native-template measured on Spider
  • test(gateway): end-to-end precision routing through resolve_model_and_bundle
  • test(gateway): prove a model routes across two precision bundles
  • feat(gateway): job aliases can carry a precision bundle (SQL->BF16 routing)
  • docs(27b): flag FP8 SQL regression on sql cap + clarify targets are documentary
  • feat(27b): measure Qwen3.6-27B on code/SQL/tools; advertise code+sql
  • feat(guard): CHECK POLICY content-moderation model + ToxicChat F1 eval
  • bench(sql): measured Spider execution accuracy + anchored floor; SQLCoder serve-validated
  • feat(code): point model="code" at the measured model; xgrammar-validate SQL grammar
  • feat(sql): onboard SQLCoder (Defog) config + starter SQL grammar artifact
  • feat(bench): add Spider text-to-SQL execution-accuracy eval + model="sql" alias
  • fix(review): address PR feedback on the code-eval
  • feat(server,gateway): advertise code capability + model="code" alias

v0.4.2

03 Jun 14:22

Choose a tag to compare

  • chore(main): release 0.4.2
  • Scope SGLang CUDA toolkit runtime
  • Enable CUDA toolkit in SGLang worker runtime
  • perf(mineru_vl): O(L) incremental no-repeat-ngram for greedy decode
  • feat(sie_server): add MinerU2.5-Pro-2604-1.2B doc OCR adapter
  • build(deps): upgrade rust toolchain to 1.96
  • feat(models): add Marqo/marqo-fashionSigLIP (SigLIP open_clip, fashion image-text)
  • fix(ci): keep sidecar out of warm cache
  • fix(review): 0.6B ctx test 1024->4096, loader except logs, README gaps resolved
  • fix(ci): avoid nested mise in integration fixture
  • docs: align sidecar naming in active docs
  • fix(deploy): align server sidecar naming
  • fix(deploy): document worker-sidecar metrics wiring
  • fix(deploy): rename sidecar container to worker-sidecar
  • fix(deploy): align server sidecar naming and kind preload smoke
  • fix(deploy): publish server sidecar image
  • feat(model): bump Qwen3-0.6B serving context 1024→4096 for prod simple-task use
  • fix(bench): let via-SIE smoke serve a profile-variant model end-to-end
  • fix(loader): wire profile runtime.default_sampling into the adapter
  • feat(model+bench): RTX-PRO-6000 FP8 profile for Qwen3.6-27B + 6000 validation
  • refactor(glm_ocr): select patch-embed conv strictly by structure (CodeRabbit)
  • perf(glm_ocr): rebind vision Conv3d patch-embed to F.linear
  • chore(adapters,ci): remove Vidore3 throughput diagnosis instrumentation
  • fix(test): restore donut helper call contract
  • fix(ci): address analyzer findings and stale queue test
  • fix(adapters): replace Qwen3-VL vision Conv3d patch-embed with matmul
  • chore(helm): clean sidecar chart observability
  • feat(sidecar): add worker config and pool admission reconciliation
  • TEMP(colqwen3): time vision sub-modules (patch_embed/block/merger)
  • test(tilt): expand sidecar e2e coverage
  • feat(sidecar): wire generation direct dispatch
  • TEMP(colqwen3): split forward timing into vision vs text (revert before merge)
  • fix(adapters): route Qwen3-VL VLMs through flash attention (Vidore3 throughput)
  • TEMP(adapters,ci): timing instrumentation for Vidore3 throughput diag
  • chore(openapi): regenerate gateway spec for min_tokens + chat_template_kwargs
  • fix: address coderabbit + code-quality review on PR #1146
  • feat(bench+model): via-sie 4-task n=300 sweep + NEXTN smaller-draft on 27B
  • fix(worker): SGLang adapter accepts min_new_tokens kwarg + 27B via-sie validated
  • fix: slow sidecar nats consumer reconcile
  • fix: gate sidecar nats reconnect refresh
  • fix: address pr review quality issues
  • fix: harden sidecar config recovery
  • chore: clean sidecar docs and packaging
  • chore: prune sidecar compatibility paths
  • docs(sidecar): clarify worker sidecar source naming
  • fix(ci): refresh gateway openapi contract
  • fix(quality): repair adapter eval harness regressions
  • fix(worker-sidecar): harden queue carveout contracts
  • fix: scope bundle config hash cache per registry
  • chore: standardize worker sidecar packaging
  • feat: reconcile live worker config in sidecar
  • Add worker config reconciliation
  • chore: clean worker sidecar deployment surfaces
  • chore: enable rust sidecar in local tilt
  • fix: require rust sidecar for queue workers
  • docs: clarify worker config apply gap
  • chore: consolidate inference sidecar package
  • fix: preserve worker batch identity and publish image
  • chore: finish rust worker rebase integration
  • chore(helm): land S20+ rust-sidecar production tuning defaults
  • chore: scrub stale sie_candle references and dead candle metric
  • non-adapter carveout: ship sie_prep + wire passthrough end-to-end
  • non-adapter carveout: retire sie_candle, carve out sie_prep, Python passthrough
  • sie_worker_rust + sie_server: queue-depth metrics for IPC+Python loop
  • worker-rust + sie_candle: NATS health heartbeats and BERT cross-encoder
  • rust-worker: retire SIE_RUST_*_MODELS env vars, carve sie_candle out into its own crate (Stage 3 P1)
  • rust-worker: land Stage 1 (tokenise + framing) + Stage 2 (scheduler) + Stage 3 design
  • adding perf-tuning grafana dash fixes and extension
  • obs(helm): perf-tuning Grafana dashboard + ConfigMap
  • perf(rope_flash): vectorize CLS/mean pooling, eliminate per-item .item() sync
  • perf(adaptive): anchor min_batch_cost floor at max_batch_tokens // 4
  • revert: restore adaptive batching defaults to 15/50ms
  • perf(batching): tighten adaptive wait ceiling + revert gte-multilingual 32k
  • perf(gte-multilingual-base): raise max_batch_tokens 16k → 32k to stop IPC-batch shred
  • perf(server): FP16 on GPU, coalesce sized for IPC bursts, starvation self-heal
  • obs(worker+server): audit follow-ups for phase + fragmentation metrics
  • obs(worker+server): surface GPU phase latency + IPC-batch fragmentation
  • feat(worker/rust): IPC connection pool — lift the sidecar's last serialization bottleneck
  • fix(gateway+server): queue is the only mode — kill direct-mode cruft
  • fix(gateway): suppress H9 first-chunk-fallback on single-worker pools
  • worker(rust)+sie_server: post-audit P0 fixes — drain min-deadline, fallback eviction tests, dispatcher outcome binding, encoder UnsupportedModel coverage, model-label cardinality tests
  • worker(rust): finalise pre-Argo audit — native Candle, fallback breaker, full observability, Docker + Helm
  • chore(sie-server): drop dead code left over from sidecar cutover
  • fix(worker): harden payload store + error paths; surface silent success bugs
  • sie-server: commit to sidecar-only queue path; remove Python NATS
  • feat(sie_worker_rust): close parity gaps with Python pull loop + smoke test
  • feat(sie_server): UDS msgpack IPC server for Rust worker sidecar
  • feat(sie_server): carve out QueueExecutor + IPC types for Rust worker POC
  • feat(bench): 0.6B via-sie validated; harness + 27B config gains
  • fix(model): bump Qwen3.6-27B default/h100 mem_fraction_static 0.85 → 0.92
  • feat(gateway+worker): chat surface accepts min_tokens + chat_template_kwargs
  • fix: accept dense_dim in dense adapters
  • test: align structured output metric constants
  • fix(sie_server): clear CUDA cache on uncovered VLM paths + drop private sem _value access
  • chore: remove internal design doc references
  • docs: clean stale design references
  • docs: minimize design architecture docs
  • docs: update design document references
  • docs: remove internal planning references
  • docs: archive obsolete roadmap
  • fix(security): cap vite at ^6 + add Node engines to website
  • feat(gateway): strengthen generation isolation guardrails
  • feat(docling): accept image input + run on OCR-bench quality path
  • fix(security): bump sie_ts_sdk standalone pnpm transitives
  • fix(security): bump root pnpm deps + add overrides for transitives
  • fix(security): bump root Python deps to patched versions
  • fix(security): bump gateway deps to patched versions
  • chore: drop CodeQL rationale comments
  • fix(quality): batch3 of CodeQL findings + bench KIE bug
  • fix(quality): drop redundant inline imports in donut + registry
  • fix(security): use Reflect.construct for WebSocket headers shim
  • fix(quality): close CodeQL quality-tab findings
  • Fix gateway queue trace isolation
  • perf(bench,adapters): parallel VLM sub-batch dispatch + drop redundant empty_cache
  • Keep generation machinery off default queue path
  • docs(test): note : -> __ replacement in test_model_yaml_filenames docstring
  • fix(models): rename ColQwen3 YAML to match sie_id casing (TomoroAI)

v0.4.1

28 May 15:08

Choose a tag to compare

  • chore(main): release 0.4.1
  • fix(security): resolve 18 open CodeQL alerts
  • Revert "Fix pool queue batching coalescing"
  • Fix pool queue batching coalescing
  • refactor(release-docker): remove sie-deps prebake; build deps in-band
  • fix: refresh generation pool fallback on hot add
  • fix: isolate generation direct dispatch
  • feat(server): add Qwen3.6-27B model + migrate to CUDA 12.9