Skip to content

Fix Area Lights PBR crash: split sampler2D function parameters#1718

Merged
bkaradzic-microsoft merged 2 commits into
BabylonJS:masterfrom
bkaradzic-microsoft:tpc-1584-sampler-fn-params
May 28, 2026
Merged

Fix Area Lights PBR crash: split sampler2D function parameters#1718
bkaradzic-microsoft merged 2 commits into
BabylonJS:masterfrom
bkaradzic-microsoft:tpc-1584-sampler-fn-params

Conversation

@bkaradzic-microsoft

Copy link
Copy Markdown
Member

Summary

Fixes a long-standing crash on shaders that use sampler2D as a function parameter — most visibly the Babylon.js "Area Lights - PBR" Playground visual test (idx 194), which uses sampler2D parameters in the LTC code path of PBRMaterial. Before this PR the crash was a deep accessChain.isRValue() assert in glslang's SpvBuilder; with debug builds disabled, the assert fired through BX_ASSERT and exited with code 3.

Two things had to be fixed to make idx 194 render end-to-end on the default Windows/D3D11 build:

1. New traverser: SamplerFunctionParameterSplitter

The existing SplitSamplersIntoSamplersAndTextures pass rewrites global sampler2D uniforms into a (texture2D, sampler) pair. It does not rewrite user-defined functions that take sampler2D by value. Such helpers — common in Babylon.js — were therefore left with a parameter type the rest of the pipeline can no longer satisfy.

SamplerFunctionParameterSplitter does the missing two halves:

  1. Function-definition rewrite — every sampler2D parameter is replaced with a (texture2D t, sampler s) pair, and uses of the original parameter inside the body are rewritten to sampler2D(t, s) constructor calls.
  2. Call-site rewrite — at each EOpFunctionCall that resolves to a rewritten function, every sampler2D argument is replaced by the (texture, sampler) pair, sourced from the global rewrite ids produced by SplitSamplers….

The subtle bit — and the reason this had to be a dedicated traverser rather than an extension of SplitSamplers… — is that glslang's call nodes carry a parallel qualifier list alongside the argument list. GlslangToSpv.cpp::handleUserFunctionCall reads qualifiers[i] for arg i. If the qualifier list isn't extended in lockstep with the inserted texture/sampler args, the SPIR-V emitter walks off the end of the qualifier list, the trailing arg never gets the OpStore that materializes its value, and the access chain into the uninitialized texture variable trips the accessChain.isRValue() assert deep in SpvBuilder.

Concretely:

args.insert(begin + i + 1, samplerOperand);
qualifiers.insert(begin + i + 1, qualifiers[i]); // <-- this line is the actual fix

Wired into all four backends that consume SplitSamplers… (DXBC, DXIL, Metal, Vulkan), with the new pass running immediately after.

2. SPIRV-Cross pin bump (BabylonJS/SPIRV-Cross#12)

Once glslang stopped asserting, the same Area Lights test reached SPIRV-Cross HLSL emission and threw Invalid call. from the WEBMIN stub at spirv_hlsl.cpp:20234 (CompilerHLSL::image_type_hlsl). BabylonJS/SPIRV-Cross#12 un-guards image_type_hlsl and image_type_hlsl_legacy in WEBMIN builds (same un-guarding pattern as the earlier #11). This PR bumps the SPIRV-Cross pin in CMakeLists.txt to pick up that change.

Testing

  • Built RelWithDebInfo against the new SPIRV-Cross pin.
  • Playground visual test idx 194 "Area Lights - PBR" now renders and validates against the reference image (Test 'Area Lights - PBR' validated, exit 0).
  • Regression sweep on the default config (no --include-excluded) at idx 0/10/50: all validated. Idx 100/200/300/400/500 are pre-existing exclusions in config.json and remain skipped=1 — no behavior change.

Files

File Change
Plugins/ShaderCompiler/Source/ShaderCompilerTraversers.h +26 (public decl + rationale)
Plugins/ShaderCompiler/Source/ShaderCompilerTraversers.cpp +394 (the new traverser)
Plugins/ShaderCompiler/Source/ShaderCompiler{DXBC,DXIL,Metal,Vulkan}.cpp +1 each (one call site)
CMakeLists.txt SPIRV-Cross GIT_TAG bump

Adds a new SamplerFunctionParameterSplitter traverser that complements
the existing SplitSamplersIntoSamplersAndTextures pass. SplitSamplers
rewrites global sampler2D uniforms into separate texture2D + sampler
declarations, but it does not rewrite functions that take sampler2D
*by value*. Any user-defined helper that took a sampler2D parameter
(common in Babylon.js shaders, notably the LTC code path used by
PBRMaterial area lights) would therefore be left with a parameter
type the rest of the pipeline can no longer satisfy.

The new traverser does the two missing things:

  1. Rewrites function definitions: every sampler2D parameter is
     replaced with a pair (texture2D t, sampler s), and uses of the
     original parameter inside the body are rewritten to
     sampler2D(t, s) constructor calls.

  2. Rewrites every call site: at each EOpFunctionCall that resolves
     to a rewritten function, every sampler2D argument is replaced
     by the (texture, sampler) pair sourced from the global rewrite
     ids produced by SplitSamplersIntoSamplersAndTextures.

The subtle bit -- and the reason this had to be a dedicated traverser
rather than an extension of SplitSamplers -- is that glslang's call
nodes carry a parallel qualifier list alongside the argument list,
and GlslangToSpv reads qualifiers[i] for arg i. If the qualifier list
is not extended in lockstep with the inserted texture/sampler args,
the SPIR-V emitter walks off the end of the qualifier list, the
trailing arg never gets the OpStore that materializes its value, and
the eventual access chain into the uninitialized texture variable
hits a "node is rvalue, expected lvalue" assert deep in SpvBuilder.

Wired into all four backends that consume SplitSamplers (DXBC, DXIL,
Metal, Vulkan), with the new pass running immediately after
SplitSamplersIntoSamplersAndTextures so the global rewrite ids are
available.

Also bumps the SPIRV-Cross pin to pick up BabylonJS/SPIRV-Cross#12,
which exposes the HLSL image_type_hlsl / image_type_hlsl_legacy
implementations in WEBMIN builds. Without that companion change, the
same Area Lights test that previously crashed in glslang now reaches
SPIRV-Cross HLSL emission and throws "Invalid call." from the WEBMIN
stub at spirv_hlsl.cpp:20234.

Net effect: Playground visual test idx 194 "Area Lights - PBR" now
renders and validates against the reference image; no regressions in
the sampled idx 0/10/50/100 sweep.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 28, 2026 14:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a crash in the Babylon Native shader compiler when Babylon.js shaders contain user-defined functions that take combined sampler2D (etc.) parameters by value — notably the "Area Lights – PBR" playground (idx 194). The existing SplitSamplersIntoSamplersAndTextures pass rewrites global uniform samplers into (texture, sampler) pairs but leaves user-function signatures with the now-unsupported combined-sampler parameter type, which tripped a deep accessChain.isRValue() assert in glslang's SPIR-V emitter.

Changes:

  • Adds a new SamplerFunctionParameterSplitter traverser that rewrites each sampler2D function parameter into a (texture, sampler) pair, replaces uses inside the body with EOpConstructTextureSampler(t, s) constructions, and re-mangles the function name; then unpacks each matching call-site argument into two args while keeping glslang's parallel per-arg qualifier list in lockstep.
  • Wires the new pass in after SplitSamplersIntoSamplersAndTextures in all four backends (DXBC, DXIL, Metal, Vulkan).
  • Bumps the SPIRV-Cross Git pin to pick up un-guarding of image_type_hlsl[_legacy] in WEBMIN builds, so HLSL emission no longer throws after glslang stops asserting.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
Plugins/ShaderCompiler/Source/ShaderCompilerTraversers.h Declares SplitSamplerFunctionParameters with rationale doc comment.
Plugins/ShaderCompiler/Source/ShaderCompilerTraversers.cpp Implements the new traverser (function-def rewrite + body-symbol rewrite + call-site rewrite with synchronized qualifier list).
Plugins/ShaderCompiler/Source/ShaderCompilerDXBC.cpp Invokes the new pass after sampler splitting.
Plugins/ShaderCompiler/Source/ShaderCompilerDXIL.cpp Invokes the new pass after sampler splitting.
Plugins/ShaderCompiler/Source/ShaderCompilerMetal.cpp Invokes the new pass after sampler splitting.
Plugins/ShaderCompiler/Source/ShaderCompilerVulkan.cpp Invokes the new pass after sampler splitting.
CMakeLists.txt Bumps SPIRV-Cross pin to the WEBMIN-unguarded HLSL image-type build.

Comment thread Plugins/ShaderCompiler/Source/ShaderCompilerTraversers.cpp Outdated
Per Copilot review on BabylonJS#1718: the previous conditional
  if (idx < qualifiers.size()) qualifiers.insert(...);
silently fell back to the buggy path -- if the qualifier list was ever
shorter than args, the qualifier insert was skipped, and we re-introduced
exactly the OOB-read -> elided OpStore -> uninitialized texture access ->
accessChain.isRValue() assert chain this pass was created to fix.

glslang's GlslangToSpv::handleUserFunctionCall indexes qualifiers[i] for
every arg i with no bounds check, so the invariant qualifiers.size() ==
args.size() must hold by construction for any EOpFunctionCall reaching
the SPIR-V emitter. Check that invariant up front and throw on violation,
matching the other defensive throws in this rewriter. The per-iteration
qualifier insert is now unconditional.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bkaradzic-microsoft bkaradzic-microsoft merged commit 6b5b9f8 into BabylonJS:master May 28, 2026
28 checks passed
bkaradzic-microsoft added a commit that referenced this pull request May 28, 2026
…hts, OpenPBR Analytic Lights) (#1721)

## Summary

Babylon.js v6+ shaders for Area Lights (`computeAreaLighting`) and a
number of other PBR/OpenPBR helpers declare a struct local without an
initializer, then immediately do read-modify-write on its fields:

```glsl
lightingInfo result;
result.specular += value;
result.diffuse  += value;
return result;
```

This is undefined behavior under the GLSL spec (locals are not
zero-initialized), but every WebGL driver in practice tolerates it.
SPIRV-Cross faithfully reproduces the pattern in its HLSL output, and
`D3DCompile` correctly rejects it with `error X4000: variable used
without having been completely initialized`. Result: `Area Lights -
Standard Material` and any other test that exercises this code path
fails to compile its fragment shader on D3D11 / D3D12 today.

## Fix

Add a new AST traverser, `ZeroInitializeStructLocals`, that walks every
function body and prepends `EOpAssign(symbol, EOpConstructStruct(0, 0,
...))` for each unique struct-typed `EvqTemporary` symbol it references.
The SPIR-V emitter then sees a normal store; SPIRV-Cross emits HLSL with
the struct local definitely initialized; `D3DCompile` accepts it. Any
subsequent real assignment in the body is dead-code eliminated by DXC's
optimizer, so functional behavior is unchanged.

Wired into the DXBC, DXIL, Metal, and Vulkan compile pipelines (same as
the existing sampler-handling and dFdy traversers). Arrays of structs
are intentionally skipped — they are uncommon in BJS shaders and would
require constructing a synthetic array constructor.

## Verification

End-to-end on `build/win32-default` (D3D11, Chakra, RelWithDebInfo) and
`build/win32-gl` (OpenGL/ANGLE, V8, Release):

| Test | D3D11 | OpenGL |
|---|---|---|
| idx 191 Area Lights - Standard Material | PASS (1027 px diff) | PASS
(1131 px diff) |
| idx 194 Area Lights - PBR (regression check for #1718) | PASS | PASS |
| idx 558 OpenPBR Fuzz Weight vs Fuzz Roughness - Analytic Lights | PASS
| PASS |
| idx 559 OpenPBR Fuzz Weight vs Coat Weight w/ Normal Maps - Analytic
Lights | PASS | PASS |
| idx 574 OpenPBR Transmission Dispersion VS IOR - Analytic Lights |
PASS | PASS |
| idx 0/39/47/165 (baseline regression checks) | PASS | PASS |

Wide regression sweep:
- D3D11: 135 indices (1-30, 100-115, 160-180, 285-305, 380-405, 540-560)
→ 134 PASS / 1 FAIL. The single failure is idx 22 `Outline` with `Error:
Unsupported alpha mode: -1` — pre-existing, unrelated to shader
pipeline.
- OpenGL: 113 indices (1-50, 100-120, 250-270, 400-420) → 52 PASS / 60
SKIP / 1 FAIL. Same idx 22 failure.

No new regressions on either backend.

## Why a BN traverser (not a SPIRV-Cross fix)

Both the SPIRV-Cross fork-extension route and the BN traverser route
would solve this. We chose the traverser because:

1. It avoids extending the BabylonJS/SPIRV-Cross fork further.
2. The fix lives next to the existing sampler/dFdy traversers, which use
the exact same pattern.
3. It applies uniformly to all four backends (DXBC, DXIL, Metal, Vulkan)
with one change.

## Out of scope

- idx 192/193 (`Textured - Area Lights - *`) still fail with
`Unsupported texture format or type: format 5, type 3553`. This is a
`NativeEngine::PrepareImage` whitelist gap and orthogonal to this PR.
- A separate cluster of OpenPBR-IBL tests (idx 557/561/575) fails with a
JS-side `TypeError: 'FLOAT' undefined` before any shader compile. Will
be filed as a separate issue.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bkaradzic-microsoft added a commit that referenced this pull request Jun 5, 2026
…ctim tests (#1715)

Rebased onto latest `master`. The original "combined CI smoke" branch
carried the bgfx MSAA+mipmaps fix, the bgfx.cmake bumps, the bx-API/bimg
`imageParse` adaptations, and a large cascade-victim test
re-enable/re-exclude experiment. **All of that is now superseded by
`master`:** - `master`'s bgfx.cmake pin (`8711456e`) is a descendant of
this branch's old pin (`5a00e8df`), so the MSAA+mipmaps fix and the
bx/bimg API updates are already in. - `master` already passes
`bx::ErrorIgnore` to `bimg::imageParse` (NativeEngine.cpp), already
fixed Area Lights / OpenPBR (#1718/#1721), and already re-enabled the
relevant Playground tests (#1716/#1719/#1698). What remains genuinely
unique is the **Canvas framebuffer-pool survivability** change, which
`master` still lacks (it has the bare `assert(handle.idx !=
bgfx::kInvalidHandle)` at `Canvas.cpp:179` and no
`BGFX_CONFIG_MAX_FRAME_BUFFERS` propagation). ### This PR (single
commit) The Canvas polyfill creates a fresh bgfx framebuffer per JS
`Canvas` instance and per text-rendering op. Under V8 GC pacing,
finalizers run in batches, so a long Playground sweep can outpace the
default 128-slot bgfx framebuffer pool. The first allocation past the
cliff returned `BGFX_INVALID_HANDLE`; the bare `assert` tripped in
Debug, or the invalid handle bled into bgfx-internal arrays in Release
(later AV at `bgfx_p.h`). The whole run aborted regardless of which test
was unlucky. - `Dependencies/CMakeLists.txt`: propagate
`BGFX_CONFIG_MAX_FRAME_BUFFERS` to the bgfx target (default 512 = 4x
bgfx default; honors the `-D` override CI already passes — previously a
no-op). - `Canvas.cpp` `UpdateRenderTarget()`: on `kInvalidHandle`, free
the two just-allocated textures and `throw std::runtime_error`;
`Context::Flush` rethrows as `Napi::Error` so the offending test fails
cleanly without taking the rest of the sweep down. -
`FrameBufferPool::Add`: same treatment for symmetry. The pool bump alone
is expected to remove the cascade in practice; the throw paths are
defense-in-depth so future regressions surface as a catchable JS error
instead of a silent process kill. _Dropped from the previous revision
(now redundant): bgfx.cmake bumps, bx-API/SIMD adaptation, nanovg
`bx/math.h` include, bimg `imageParse` ErrorIgnore source changes, and
the cascade-victim `config.json` re-enable/re-exclude churn +
`validation_native.js` continue-past-failures scaffolding. The
continue-past-failures runner improvement can be revived as its own
focused PR if still wanted._

---

### Plus: 25 crash-cascade victim test re-enables

The framebuffer-pool abort above was the direct cause of a large family
of Playground tests being excluded as Win32 D3D11 `ACCESS_VIOLATION` /
`STATUS_BREAKPOINT` "cascade victims" — they were never broken
individually; the suite just crashed near them when the FB pool
overflowed, and whichever test was running got blamed.

With the abort fixed, this PR re-enables **25** of those tests. Each was
verified to introduce **zero** downstream regressions in a full headless
sweep on **both** the D3D11 and ANGLE/OpenGL backends (185 ran / 185
passed / 0 failed on each).

**Deferred to a follow-up PR (not re-enabled here):** the remaining
cascade-victim tests are blocked by a single distinct bug —
`cube-with-holes-using-stencil-buffer` (`#CW5PRI#11`) is a render-state
corruptor that leaves the stencil buffer dirty and produces a ~38-test
downstream pixel cascade on both backends (it is the same corruptor
behind the previously-observed `sphere-…-glow-layer` D3D11 failure).
That needs its own fix before its neighbours can be safely re-enabled.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bkaradzic-microsoft added a commit to bkaradzic-microsoft/BabylonNative that referenced this pull request Jun 5, 2026
Follow-up to BabylonJS#1715. Now that the Canvas framebuffer-pool fix (BabylonJS#1715) and the
Area Lights / OpenPBR shader fixes (BabylonJS#1718 sampler2D split, BabylonJS#1721 zero-init
struct locals) have landed, a further 52 previously-excluded Playground tests
pass cleanly. Each was verified with a full headless sweep on BOTH Win32 D3D11
and ANGLE OpenGL locally (0 failures, 237 ran / 237 passed).

- 23 Area Lights / OpenPBR Analytic Lights tests fixed by BabylonJS#1718 + BabylonJS#1721.
- 29 framebuffer-pool cascade victims fixed by BabylonJS#1715.

Still excluded (deferred): 6 OpenPBR pixel-diffs, 2 OpenGL-only failures
(UV2 Morphing, Vertex Pulling Morph Targets and Bones), the Linux pixel-diff
set, and two state corruptors that need their own fixes:
cube-with-holes-using-stencil-buffer (dirty stencil) and GLTF Node visibility
test (dirties OpenGL state, breaks downstream MeshDebugPluginMaterial).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bkaradzic-microsoft added a commit that referenced this pull request Jun 5, 2026
## Summary

Follow-up to #1715. Re-enables **52 more** previously-excluded
Playground tests that now pass after recent fixes landed on `master`:

- **23** Area Lights / OpenPBR Analytic Lights tests — fixed by #1718
(sampler2D function-parameter split) and #1721 (zero-init struct-typed
locals).
- **29** Canvas framebuffer-pool cascade victims — fixed by #1715
(FB-pool survivability).

## Verification

Each test was verified with a full headless sweep on **both** Win32
D3D11 and ANGLE OpenGL locally:

```
Run complete. ran=237 passed=237 failed=0 missingRef=0 skipped=483
```

0 failures on either backend, and no regressions on the
previously-active set (the sweep also confirms no downstream cascade
onto already-enabled tests).

## Deferred (still excluded)

| Group | Count | Why |
|---|---|---|
| OpenPBR pixel-diffs | 6 | render correctly now but differ from
reference; need rebake/triage |
| OpenGL-only failures | 2 | UV2 Morphing, Vertex Pulling Morph Targets
and Bones |
| Linux pixel-diffs | ~22 | real pixel comparison failures, separate
triage |
| State corruptors | 2 | `cube-with-holes-using-stencil-buffer` (dirty
stencil) and `GLTF Node visibility test` (dirties OpenGL state -> breaks
downstream `MeshDebugPluginMaterial`). Each needs its own fix before
re-enabling. |

The two corruptors were pinned down by bisection — re-enabled alone,
each produces a downstream cascade — so they are intentionally held
back.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants