Fixed race condition when passing data to bgfx. Issue #1398#1628
Conversation
There was a problem hiding this comment.
Pull request overview
Addresses issue #1398 by reducing the chance of a use-after-free when uploading texture data to bgfx via bgfx::makeRef, ensuring CPU-side code doesn’t keep reading bimg::ImageContainer fields after bgfx may have triggered the release callback.
Changes:
- Cache
m_numMipsbefore mip iteration in 2D texture uploads to avoid reading from a potentially freedbimg::ImageContainer. - Cache
m_numMipsbefore mip iteration in cube texture uploads (single-image-per-face path). - Modify the cube-texture release callback condition to only free on the last face/last mip.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
LoadTextureFromImage (and the single-mip-per-face branch of LoadCubeTextureFromImages) drove its mip loop with the condition `mip < image->m_numMips`, re-reading image->m_numMips every iteration. On the last iteration the mip's bgfx::makeRef hands ownership of `image` to bgfx's release queue (releaseFn calls bimg::imageFree(image)). bgfx may run that release on the render thread during bgfx::frame() as soon as the Update2D/UpdateCube command is submitted. The loop then evaluates its exit condition by dereferencing image->m_numMips on memory that has just been freed; if the stale read happens to exceed mip, the loop re-enters imageGetRawData() on the freed container, whose garbage m_format indexes s_imageBlockInfo out of bounds -> intermittent access violation (bimg/image.cpp). This is a use-after-free (CWE-416), reproduced as a rare (~2-4%) crash on texture-loading validation tests. Note: bgfx resource creation/update are themselves mutex-guarded (m_resourceApiLock, also held by bgfx::frame()) and are safe to call from a worker thread; the bug is BabylonNative dereferencing `image` after transferring its ownership to bgfx's async release queue, not a bgfx race. Fix: cache image->m_numMips in the loop initializer (`for (uint8_t mip = 0, numMips = image->m_numMips; mip < numMips; ++mip)`) so the loop bound is not re-read from `image` after the freeing submit. This restores verbatim the change made by commit e8ee5f7 (BabylonJS#1628) that was reverted by BabylonJS#1652. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
) ## Problem `NativeEngine::LoadTextureFromImage` (and the single-mip-per-face branch of `LoadCubeTextureFromImages`) drove the mip-upload loop with `for (uint8_t mip = 0; mip < image->m_numMips; ++mip)`, re-reading `image->m_numMips` for the exit condition on every iteration. On the **last** iteration the mip is submitted to bgfx via `bgfx::makeRef(imageMip.m_data, imageMip.m_size, releaseFn, image)`, where `releaseFn` calls `bimg::imageFree(image)`. bgfx invokes that release once it has consumed the memory — which happens on the render thread inside `bgfx::frame()`, potentially **immediately** after `Update2D`/`UpdateCube` submits the command. The loop then evaluates `mip < image->m_numMips` by dereferencing `image` *after it may already have been freed*. If the stale `m_numMips` read happens to be greater than `mip`, the loop re-enters `bimg::imageGetRawData(*image, ...)` on the freed container; its garbage `m_format` indexes `s_imageBlockInfo` out of bounds → an intermittent access violation. This is a **use-after-free (CWE-416)**, reproduced as a rare (~2–4%) crash on texture-loading validation tests (e.g. an NME particle test that loads a ramp texture while the render loop is active). ### Not a bgfx threading bug bgfx resource creation/update **are** mutex-guarded (`m_resourceApiLock`, which `bgfx::frame()` also holds), so calling them from the decode worker thread is safe and supported. The defect is purely that BabylonNative keeps dereferencing `image` after handing its ownership to bgfx's asynchronous release queue. ## Fix Cache `image->m_numMips` in the loop initializer (`for (uint8_t mip = 0, numMips = image->m_numMips; mip < numMips; ++mip)`) so the loop bound is no longer re-read from `image` after the freeing submit. This restores **verbatim** the change made in [`e8ee5f7`](e8ee5f7) (#1628) — see Regression history below. ## Regression history This is a **re-fix of a previously-fixed bug**. The exact same race was fixed in April by commit [`e8ee5f7`](e8ee5f7) ("Fixed race condition when passing data to bgfx. Issue #1398", #1628), which changed both loops to `for (uint8_t mip = 0, numMips = image->m_numMips; ...)`. That fix was inadvertently reverted by commit [`6bb8014`](6bb8014) ("Reworked threading model.", #1652), which restored the original `for (uint8_t mip = 0; mip < image->m_numMips; ++mip)` form in both `LoadTextureFromImage` and `LoadCubeTextureFromImages`, reintroducing the use-after-free. This PR re-applies the original fix unchanged. ## Validation - Stress-tested an affected texture-loading validation test **3000× headless** (30 × 100, 5-way concurrent for extra scheduler pressure): **0 crashes** (baseline crashes ~8% / ~1-in-12). At that baseline rate P(0 in 3000) is effectively zero, so the use-after-free is eliminated. - Particle validation subset: no regressions. Co-authored-by: Branimir Karadzic <branimirkaradzic@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Change makes
imagepointer not being touched after passing data to bgfx.Alternative solution is to use
bgfx::copyinstead ofbgfx::makeRef.