Skip to content

Conversation

daniel-wer
Copy link
Member

@daniel-wer daniel-wer commented Sep 23, 2025

Use loops for bi- and trilinear filtering in shader to speed up shader compilation and avoid WebGL crashes for datasets with many layers.

Turns out, WebGL drivers usually can optimize loops much better than unrolled ones. In this case, the getColorForCoords function is very complex which exacerbates the issue.

There's still a lot of room for improvement, but this is a low-hanging fruit we should definitely pick.

URL of deployed dev instance (used for testing):

Steps to test:

  • Open a dataset with more than 3 color layers or a dataset that crashes WebGL on master on your machine (for example, the data_types dataset, at least for my machine). WebGL should no longer crash. Also other datasets should load faster and with less lag than on master.
  • Measurements for my machine obtained using the Chrome performance tab:
    • 1-color layer 1.7s GPU freeze (now no freeze)
    • 2-color layer 7s GPU freeze (now 0.9s)
    • 3-color layer 14s GPU freeze (now 1.8s)
    • 4-color layer crash (now 2.4s)
    • 5-color layer crash (now 4.5s)
    • 8-color layer crash (now 10.8s)
    • 1-color 1-segmentation layer 3s GPU freeze (now 0.2s)

Issues:

  • fixes WebGL crashes for datasets with many layers

(Please delete unneeded items, merge only when none are left open)

  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases

… compilation and avoid WebGL crashes when many layers are present
@daniel-wer daniel-wer self-assigned this Sep 23, 2025
Copy link
Contributor

coderabbitai bot commented Sep 23, 2025

📝 Walkthrough

Walkthrough

Replaced unrolled bilinear/trilinear sampling with loop-collected arrays and stricter negative-alpha checks; removed outputCompressedEntry interface variables from main_data_shaders; eliminated non-fragment side-effect writes and initialized return vectors in texture access; added an unreleased note about a WebGL crash fix and shader compile optimization.

Changes

Cohort / File(s) Summary
Shader filtering refactor
frontend/javascripts/viewer/shaders/filtering.glsl.ts
Replaced unrolled explicit 4/8 texture samples with loop-based collection into arrays for bilinear/trilinear filtering; updated interpolation/mix to use array elements; added negative-alpha checks covering all samples with an early fallback; added comments discouraging unrolling to avoid WebGL compile issues.
Removed shader interface variables
frontend/javascripts/viewer/shaders/main_data_shaders.glsl.ts
Removed public shader interface declarations for flat in uvec4 outputCompressedEntry[<%= globalLayerCount %>] and flat out uvec4 outputCompressedEntry[<%= globalLayerCount %>].
Texture access behavior & init
frontend/javascripts/viewer/shaders/texture_access.glsl.ts
Removed non-fragment-path writes that previously set outputSeed and outputCompressedEntry; added explicit initialization of returnValue[0] and returnValue[1] to zero vectors at getColorForCoords64 start to ensure defined defaults.
Unreleased notes
unreleased_changes/8950.md
Added release note documenting a fix for WebGL crashes on datasets with many layers and an optimization that speeds up shader compilation time.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble pixels, loop and store,
Four by eight, not unrolled anymore.
If alpha turns sour, I bound away—
Safe defaults keep crashes at bay.
— Your friendly rabbit, compiling hooray 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Speed up shader compilation and avoid WebGL crashes" succinctly and accurately captures the primary intent and outcome of the changeset, which replaces unrolled shader sampling with loops to reduce shader complexity and avoid WebGL crashes; it is concise, specific, and relevant to the main change.
Description Check ✅ Passed The PR description clearly describes the change (using loops for bi-/trilinear filtering), the rationale (reduce shader complexity and avoid WebGL crashes), includes testing steps, a dev deployment URL, and performance observations, so it is directly related to and informative about the changeset.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch avoid-webgl-crashes

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f552a57 and bd57d4b.

📒 Files selected for processing (2)
  • frontend/javascripts/viewer/shaders/main_data_shaders.glsl.ts (0 hunks)
  • frontend/javascripts/viewer/shaders/texture_access.glsl.ts (1 hunks)
💤 Files with no reviewable changes (1)
  • frontend/javascripts/viewer/shaders/main_data_shaders.glsl.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-22T17:18:43.411Z
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/geometries/materials/plane_material_factory.ts:568-585
Timestamp: 2024-11-22T17:18:43.411Z
Learning: In the file `frontend/javascripts/oxalis/geometries/materials/plane_material_factory.ts`, the uniform name `allResolutions` should remain unchanged to maintain consistency with the shader code.

Applied to files:

  • frontend/javascripts/viewer/shaders/texture_access.glsl.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (2)
frontend/javascripts/viewer/shaders/texture_access.glsl.ts (2)

188-189: LGTM! Explicit initialization prevents undefined behavior.

The explicit initialization of return container elements to zero vectors ensures predictable behavior and addresses potential issues with undefined values in edge cases.


114-116: Review incorrect — non-fragment side-effect writes still present

frontend/javascripts/viewer/shaders/texture_access.glsl.ts still contains the guarded write:

      <% if (!isFragment) { %>
        outputSeed[globalLayerIndex] = seed;
      <% } %>

No references to outputCompressedEntry were found in the repository.

Likely an incorrect or invalid review comment.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
frontend/javascripts/viewer/shaders/filtering.glsl.ts (4)

17-33: Looped sampling looks good; avoid hardcoding the bucket-address flag and add int→float casts.

  • Consider not dropping the precomputed-bucket-address optimization in filtered paths. Either thread the flag through, or at least make it a compile-time constant.
  • Cast loop indices to float in the offset to avoid strict GLSL ES 3.0 drivers complaining.

Apply this diff to add explicit casts (safe across GLSL ES versions):

-              vec3 offset = vec3(x, y, 0);
+              vec3 offset = vec3(float(x), float(y), 0.0);

Outside this range, prefer making the flag a constant:

// before loops
const bool supportsPrecomputedBucketAddress = false;

Alternatively (recommended), thread the flag through the signatures and call sites:

// function signatures
vec4 getBilinearColorFor(float layerIndex, float d_texture_width, float packingDegree, vec3 coordsUVW, bool supportsPrecomputedBucketAddress);
vec4 getTrilinearColorFor(float layerIndex, float d_texture_width, float packingDegree, vec3 coordsUVW, bool supportsPrecomputedBucketAddress);

// call sites in getMaybeFilteredColor(...)
color = getBilinearColorFor(layerIndex, d_texture_width, packingDegree, worldPositionUVW, supportsPrecomputedBucketAddress);
color = getTrilinearColorFor(layerIndex, d_texture_width, packingDegree, worldPositionUVW, supportsPrecomputedBucketAddress);

Please verify that forcing supportsPrecomputedBucketAddress = false inside bilinear/trilinear paths does not regress performance on datasets where precomputed addressing is available. Also sanity-check on Safari/WebGL1 and Android GPUs for dynamic array writes in loops.


21-33: Optional: compute “anyNegative” during sampling to avoid long OR chain and enable short-circuiting.

Keeps code simple while reducing a large conditional and lets compilers short-circuit.

Apply this diff:

-      int idx = 0;
+      int idx = 0;
+      bool anyNegative = false;
       for (int y = 0; y <= 1; y++) {
           for (int x = 0; x <= 1; x++) {
               vec3 offset = vec3(float(x), float(y), 0.0);
               samples[idx] = getColorForCoords(
                   layerIndex, d_texture_width, packingDegree,
                   coordsUVW + offset,
                   supportsPrecomputedBucketAddress
               );
+              anyNegative = anyNegative || (samples[idx].a < 0.0);
               idx++;
           }
       }
 
-      if (samples[0].a < 0.0 || samples[1].a < 0.0 || samples[2].a < 0.0 || samples[3].a < 0.0) {
+      if (anyNegative) {
         // We need to check all four colors for a negative parts, because there will be black
         // lines at the borders otherwise (black gets mixed with data)
         return vec4(0.0, 0.0, 0.0, -1.0);
       }

Also applies to: 34-34


61-77: Trilinear looped sampling: same nits—don’t hardcode the flag; add int→float casts.

  • Prefer not dropping the precomputed-bucket optimization on filtered paths; thread the flag through (see prior comment).
  • Add explicit casts for portability.

Apply this diff to add explicit casts:

-                  vec3 offset = vec3(x, y, z);
+                  vec3 offset = vec3(float(x), float(y), float(z));

Please verify WebGL1 (GLSL ES 1.00) and WebGL2 (GLSL ES 3.00) drivers, especially Safari/ANGLE variants, for dynamic indexing into local arrays in nested loops.


79-81: Optional: replace 8-way OR with a loop-aggregated flag.

Reduces condition size and can short-circuit.

Apply this diff:

-      if (samples[0].a < 0.0 || samples[1].a < 0.0 || samples[2].a < 0.0 || samples[3].a < 0.0 ||
-        samples[4].a < 0.0 || samples[5].a < 0.0 || samples[6].a < 0.0 || samples[7].a < 0.0) {
+      bool anyNegative = false;
+      for (int i = 0; i < 8; ++i) {
+        anyNegative = anyNegative || (samples[i].a < 0.0);
+      }
+      if (anyNegative) {
         // We need to check all eight colors for a negative parts, because there will be black
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a9cd8a and c06c566.

📒 Files selected for processing (1)
  • frontend/javascripts/viewer/shaders/filtering.glsl.ts (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
🔇 Additional comments (2)
frontend/javascripts/viewer/shaders/filtering.glsl.ts (2)

40-41: LGTM: correct bilinear interpolation order.

Indexing order matches ab (y=0), cd (y=1) and mixing along x then y.


86-87: LGTM: correct trilinear interpolation sequence.

ab/cd at z=0, ab2/cd2 at z=1, then mix along x→y→z in the right order.

Also applies to: 90-91

@hotzenklotz
Copy link
Member

FYI: Safari on Mac and iOS work :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants