Skip to content

ModelCache v0.1 — PVC backend, multi-node #66

Description

@bassam

Design lives in design/modelcache/ (branch dennis/modelcache-design). This issue tracks v0.1 implementation.

v0.1 scope

PVC backend, multi-node ready, no dedup. From the design doc:

  • ModelCache CRD with artifact discriminator (Weights, Tokenizer, Bytes)
  • Sources: huggingFace, s3, http, inline, configMap
  • PVC (RWX) storage backend with one-shot prefetch Job (absorbs #61)
  • replication: AllMatchingClusters — one RWX PVC per matching cluster, shared across all LWS gang pods
  • clusterSelector.matchLabels for cluster filtering
  • Mount path intrinsic to the cache; deployments reference by name via caches: [{ name }]
  • Scheduling gated on per-cluster cache Ready condition
  • Fail-fast when a target cluster has no RWX storage class on InferenceCluster.spec.storage.storageClassName
  • Pluggable storage backend pattern shared with #72 KVOffloadTier

Out of scope (tracked separately)

  • LoraAdapter / Engine artifact kinds → v0.2
  • ContentAddressed backend (Modal-style tiered cache + lazy loading) → v0.2
  • Cross-deployment / cross-tenant dedup → v0.2
  • gcs / azure / oci / pvc-clone sources → v0.2
  • AllMatchingNodes replication mode → v0.2
  • Substrate unification #72 → v0.3

Roadmap detail in the design doc § v0.2 and § v0.3.

Examples

Nine (ModelCache + ModelDeployment) examples in design/modelcache/examples/: single-cluster basic, multi-node TensorPipeline gang, multi-cluster replication, separate tokenizer, private S3, opaque Bytes kind, plus three v0.2 previews.

References

  • Design doc
  • Examples
  • #61 (closed) — RWX PVC mechanism
  • #72
  • PR #75engine.env + imagePullSecrets; ModelCache rides on those for credential-bearing sources

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions