Skip to content

Post-1.0 performance wish list #106

Description

@mxpv

These are intentionally deferred until after 1.0 — the priority pre-1.0 is correctness and spec compliance, not optimization. I'll accumulate the wish list for things to explore post-1.0:

  • sdf::Path representation. Path is the most heavily used primitive in the API and is currently backed by a String, which is highly inefficient — every path operation touches heap-allocated string data, and paths are cloned and compared constantly throughout composition. Worth exploring string interning and compile-time string hashing.
  • Token storage. Tokens are strings too and have the same problem. They want the same treatment — an interned token table so repeated names (prim types, property names, metadata keys) are deduplicated and compared cheaply, rather than living as independent Strings.
  • Small-object optimization. Reach for smallvec / smallbox / heapless to keep small, short-lived collections on the stack instead of the heap. Composition is full of tiny vecs that almost always hold a handful of elements — a node's layer-stack members, path component lists, a prim index's children, sibling/strength-order projections, listop item vectors — and each is a heap allocation today.
  • PCP allocations. Composition does an enormous number of temporary allocations — every prim index build allocates nodes, maps, path translations, and intermediate vecs. Some form of arena allocation (per-build or per-cache) would cut this dramatically and improve locality. This is likely the single biggest win.
  • Memory-mapped binary reads. The crate (.usdc) reader currently does a single bulk read_all() into a Vec<u8> and parses from a cursor. For large files it could instead memory-map the asset, letting the OS page data in on demand — a natural fit for the format's lazy value decoding (only the sections actually queried get faulted in) and a way to avoid copying whole multi-gigabyte files into the heap up front.
  • Async API surface. Need to decide on the async story — see Async-friendly layer loading (let the host provide the bytes) #105. The I/O chokepoint is small ("get me the bytes for this layer"), so the open question is whether to expose just that surface as async without forcing the whole API to become async, and how that interacts with the rayon decision below.
  • Parallelism (rayon). There are bunch of TODO(rayon)/TODO(perf) markers across the codebase flagging independent work that could run in parallel — composition is the obvious candidate, since per-prim index builds take only & references today specifically to stay Rayon-friendly. Need to measure the actual speedup before committing, and decide whether it's worth the dependency. Rayon would likely sit behind a feature gate, and note the tension with async.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions