Skip to content

Bundle-manifest gate is a CI-GPU-render measurement, not a build artifact — not reproducible locally (blocks engine PRs) #348

Description

@RaananW

Bundle-manifest gate is a CI-GPU-render measurement, not a build artifact — not reproducible locally (blocks engine PRs)

Summary

The per-scene bundle-size manifests (lab/public/bundle/manifest/<scene>.json) that the Bundle Size CI job validates as "committed and in sync" are not deterministic build artifacts. Their runtimeChunks list is produced by rendering each scene in a headless browser and recording which JS chunks get fetched over the network. That measurement only works in the CI Linux environment (which enables SwiftShader software-WebGPU), so the manifests cannot be regenerated correctly on a developer machine (macOS confirmed; likely any non-CI environment).

This makes the mandatory "regenerate and commit the manifest on every PR" workflow (GUIDANCE.md) effectively impossible to satisfy locally for any PR that changes which chunks a GPU scene fetches. It is a significant, recurring source of red CI on engine PRs (surfaced while working on #302).

Important: this is NOT a build-determinism bug

The compiled bundles themselves are byte-for-byte deterministic across OSes (pinned Rollup/esbuild; tree-shaking is pure JS). Verified: a local build emitted all 86 scene5-*.js chunk files, identical to what CI builds. Only the measured runtimeChunks subset differs. Developers' shipped output is safe — the problem is isolated to the manifest measurement step.

Mechanism (root cause)

  1. The Bundle Size job (azure-pipelines.yml, job BundleSize, vmImage: ubuntu-latest) runs pnpm build:bundle-scenes then pnpm validate:bundle-manifest, which compares committed manifests against a fresh rebuild.
  2. runtimeChunks is written by measurePage() in scripts/bundle-scenes-core.ts (~line 1247): it launches headless Chrome, loads bundle-<scene>.html, waits up to 50s for canvas.dataset.ready === "true", and records every /bundle/*.js response as a fetched chunk. If the scene never reaches ready (no working WebGPU), it silently measures whatever partial set was fetched (catch block ~line 1279: "BJS pages may not reach ready state without GPU — just measure fetched JS").
  3. measurementBrowserArgs() (~line 944) only adds the SwiftShader software-WebGPU flags (--enable-features=Vulkan, --use-vulkan=swiftshader, --use-angle=swiftshader, --ignore-gpu-blocklist) when process.env.CI is set. Locally you get only --enable-unsafe-webgpu.

Net effect: whether a scene's skeleton/morph/PBR dynamic-import chunks are counted depends entirely on how completely the measuring browser renders the scene.

Evidence

Same commit, same scene5 (Alien skinned model), same bundles on disk — three different manifests purely from the render environment:

Environment canvas.ready scene5 runtimeChunks rawKB
CI (ubuntu + SwiftShader) reaches ready 16 91.6
Local macOS, no CI env partial 7 53.8
Local macOS, CI=1 (SwiftShader) fails (Vulkan/SwiftShader unsupported on macOS) 1 47.4

The CI value of 16 is canonical and reproducible (confirmed: PR #343's build 55557 passed its own validate:bundle-manifest against the committed 16-chunk manifest). The local values are wrong because headless Chrome on macOS can't drive the same WebGPU path.

Impact

  • Any engine PR that changes a GPU scene's fetched-chunk set cannot produce a correct manifest locally -> Bundle Size drift failure until regenerated in the CI env.
  • Developers "fight" the gate: local pnpm build:bundle-scenes produces divergent manifests that must not be committed, but committing nothing also fails the drift check.
  • GUIDANCE.md instructs devs to run pnpm build:bundle-scenes and commit the regenerated manifests as part of every PR — which is not achievable off the CI Linux+SwiftShader environment.
  • Concretely blocking now: the tree-shaking fix in perf(ci): cut cloud parity time — public static site + sharded BrowserStack (CDP) #302 (commit fb5011fb) legitimately adds the primitive-resolver chunk to scene257/scene260 (+ related primitive/sampler scenes); their manifests can only be regenerated on CI.

Options to fix the gate (for discussion)

  1. CI regenerates + commits manifests. Add a pipeline step (or bot) that rebuilds manifests on the ubuntu agent and commits them back to the PR branch, removing the local-regen burden entirely.
  2. Make local regen reproducible. Provide a pinned linux/amd64 container (or documented flags) with working SwiftShader-WebGPU so any dev on any OS can regenerate byte-identical manifests. (Note: SwiftShader-Vulkan does not work on macOS host directly — a Linux container is the realistic path.)
  3. Gate on stable/static data instead of a render measurement. Validate the statically-derived built chunk graph and/or the existing size ceilings (scene-config.json maxRawKB) rather than the GPU-render-measured runtimeChunks fetch set. Keeps the size signal without the environment dependency.

Suggested acceptance criteria

  • A developer on macOS/Windows/Linux can produce a manifest that matches CI's validate:bundle-manifest byte-for-byte, OR the gate no longer depends on a GPU-render measurement, OR CI owns manifest regeneration so contributors never hand-generate them.
  • The chosen approach is documented in GUIDANCE.md.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions