speed optimisations, fix for restoreColorLossy, code cleanup #4895

DedeHai · 2025-08-31T17:00:20Z

A few tweaks to increase rendering speed just a bit more. On ESP32 this comes at a cost of ~500bytes of flash use, on C3 and ESP8266 it saves ~300bytes. The flash and speed numbers in the text below are for ESP32.

Edit: with the latest commit, code now also shrinks for ESP32 by 400 bytes.

Created WLED_O2_ATTR alias for `attribute((optimize("O2")))
Moved fastColorScale() function from PS to colors.h, changed it to use 32bit math and put it to good use wherever scaling accuracy does not matter too much, i.e. in Segment::fadeToBlackBy() and Segment::blur(), making these function much faster. I also made it an inline function as the function call overhead is using more flash than the function itself. Since no byte access is used in the function, this is safe for all color operations, even if in 32bit access RAM or at least I'd exptect so, did not test that but it will show soon enogh ;)
Added new _NPBbri variable to bus: need to track total brightness scaling (new ABL code) for NPB buffers or BusDigital::getPixelColor() will return incorrect color if ABL is engaged. Function is currently not used but if we remove the global _pixels[] buffer, the Copy FX will need it. I did not test this particular code change, someone please check if the logic holds (the math should be correct, I checked that).
color_blend():adding WLED_O2_ATTR gives a rendering speed improvement of about 1% at the cost of 100 bytes of flash.
Segment::setPixelColor(): removed IRAM_ATTR, using WLED_O2_ATTR instead: significant FPS improvement of 4% for the cost of 350bytes of flash.
Segment::getPixelColor() removed IRAM_ATTR, using WLED_O2_ATTR instead: faster in tests and even saves a bit of flash.
isPixelClipped() 1D & 2D version: removing IRAM_ATTR will highly likely just inline the function as it is called only once. A small flash use reduction seems to confirm that. This is faster than forcing a function call to IRAM. Since its only used during transitions I did not measure the speed impact and I would not expect it to be huge.
Get/setPixelColorX: instead of casting vWidth/vHeight to int, cast X and Y to unsigned and save the negative check. Makes code size smaller so I assume it is faster too. I also thought about making X/Y coordinated unsigned in general to get more consistency throughout the code but decided I will leave that to a futuer endeavour.

Code cleanup:

Changed parameter order in settings_leds.htm to match cpp file as it was confusing, no functional change.
Bugfix in adjust_color() and fixed indentation
Removed IRAM_ATTR_YN from unused gamma functions (just in case that attribute forces the compiler to put them there)

In summary: this PR and in combination with #4889 there was a significant improvement in FPS in my test. From 68FPS to 78FPS (4 Layers, 32x32 on ESP32), each PR contributes roughly half of that improvement. On a more general test on the C3 the gain was less significant but still visible (+2FPS at 50FPS just this PR alone).

Summary by CodeRabbit

Bug Fixes
- More accurate colors when brightness limiting (ABL) is active.
- Corrected color adjustment to avoid unintended shifts.
- Improved bounds checking to prevent artifacts from out-of-range pixels.
Refactor
- Faster, more robust blur and color-scaling for smoother visuals and improved rendering performance.
- Reduced overhead in particle and rendering paths for more consistent effects.
Style
- Minor formatting cleanup on the LED settings page (no functional changes).

coderabbitai · 2025-08-31T17:00:27Z

Walkthrough

Refactors color-scaling/fade helpers and function attributes, introduces WLED_O2_ATTR, hardens 2D XY bounds checks, rewrites 1D/2D blur logic to use fast scaling and carry/part schemes, adds per-bus applied-brightness tracking for ABL, and makes a formatting-only HTML edit.

Changes

Cohort / File(s)	Summary of changes
Optimization attribute unification `wled00/const.h`, `wled00/FX_fcn.cpp`, `wled00/FXparticleSystem.cpp`, `wled00/colors.cpp`	Added `WLED_O2_ATTR` macro and applied it to selected functions; removed or replaced prior `IRAM_ATTR_YN`/attribute annotations and standardized per-function optimization attributes.
Color scaling / fade refactor `wled00/colors.h`, `wled00/colors.cpp`, `wled00/FX_fcn.cpp`, `wled00/FX_2Dfcn.cpp`, `wled00/FXparticleSystem.cpp`	Introduced `fast_color_scale(...)` (static inline) and moved `color_fade` into `NeoGammaWLEDMethod`; replaced many `color_fade`/scale/add call sites with `fast_color_scale` or new scale-add variants and adjusted related color math.
FX 1D/2D blur and pixel access updates `wled00/FX_fcn.cpp`, `wled00/FX_2Dfcn.cpp`	Rewrote blur loops to use fast scaling and a carry/part scheme; removed `IRAM_ATTR_YN` from several `Segment` pixel methods; hardened XY bounds checks via unsigned casts to treat negative coords as OOB.
Particle system color ops & attributes `wled00/FXparticleSystem.cpp`	Added `fast_color_scaleAdd(...)` (saturating 32-bit scale+add), removed older helpers, migrated render/blur/collision call sites to new helpers; switched rendering/collision methods to `WLED_O2_ATTR`.
Bus manager ABL tracking `wled00/bus_manager.h`, `wled00/bus_manager.cpp`	Added protected `_NPBbri` member and moved `_autoWhiteMode` out of bitfield; track/update `_NPBbri` in `applyBriLimit` and `show`, and use it in `getPixelColor` for color restoration under ABL.
Colors / gamma adjustments `wled00/colors.cpp`, `wled00/colors.h`	Added `WLED_O2_ATTR` to `color_blend`/`color_add`; removed `IRAM_ATTR_YN` wrappers from `NeoGammaWLEDMethod` methods; adjusted `adjust_color` logic and added overflow-aware path in `color_add`.
HTML formatting only `wled00/data/settings_leds.htm`	Reordered local variable declarations in `bLimits`; no behavioral change.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Bugfixes in PS, improvements to PS Fireworks 1D #4673 — Modifies FXparticleSystem.cpp render and particle init paths (overlaps particle rendering changes).
use video scaling instead of NPB luminance & new ABL #4798 — Changes color-scaling helpers and blur/color paths; closely related to the fast_color_scale/fade refactor.
Segment layers and better effect transitions (blending) #4658 — Overlaps FX 1D/2D pixel/blur changes and Segment pixel getter/setter edits.

Suggested reviewers

willmmiles
blazoncek
netmindz

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly captures the PR's main goals—performance optimizations, a specific bugfix for restoreColorLossy, and general code cleanup—which match the changes and objectives in the PR description and file summaries; it is concise and relevant for a reviewer scanning history. It avoids vague wording and directly references the primary impact areas of the changeset.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

@blazoncek

applying more bit and shift manipulation tricks to squeeze out just a bit more speed on color manipulation functions. - Optimization on blur is based on work by @blazoncek - Renamed PS fast_color_add() to fast_color_scaleAdd() In my arbitrary speed tests thse changes resulted in 2-3% higher FPS.

DedeHai · 2025-09-13T22:00:49Z

I am guessing it's hard to review this properly as there are many details I worked hours on an they are a bit hard to grasp without spending much time studying the code in detail. I tested each single change for speed and corect operations. While I cannot guarantee there are no bugs introduced this is generally safe to merge.

blazoncek

Any speedup is welcome, however code readabilty suffers without comments what the code does.

blazoncek · 2025-09-14T07:01:36Z

wled00/FX_2Dfcn.cpp

 {
  if (!isActive()) return; // not active
-  if (x >= (int)vWidth() || y >= (int)vHeight() || x < 0 || y < 0) return;  // if pixel would fall out of virtual segment just exit
+  if ((unsigned)x >= vWidth() || (unsigned)y >= vHeight()) return;  // if pixel would fall out of virtual segment just exit


Why don't you go all the way and change every int into unsigned?

It would save you a few manual conversions like this and the code would be cleaner.

However, do check if any of the effects produce negative index/coordinates to avoid other errors.

"Get/setPixelColorX: instead of casting vWidth/vHeight to int, cast X and Y to unsigned and save the negative check. Makes code size smaller so I assume it is faster too. I also thought about making X/Y coordinated unsigned in general to get more consistency throughout the code but decided I will leave that to a futuer endeavour."

int was introduced to combat issues @softhack007 noticed with a few SR effects (possibly others too) that produced negative indexes.
Originally indexes were uint16_t.

1D uses int for "vitual strips" IIRC, I wanted to convert 2D to unsigned before but struggled with "ambiguous" function calls I was not sure how to resolve so I abandoned it.

1D uses int for "vitual strips"

not int actually. it encodes virtual strip number into upper 16 bits. which means it can produce incorrect results.

blazoncek · 2025-09-14T07:02:05Z

wled00/colors.cpp

-    w = w > 255 ? 255 : w;
-    return RGBW32(r,g,b,w);
+    // branchless per-channel saturation to 255 (extract 9th bit, subtract 1 if it is set, mask with 0xFF)
+    rb |= ((rb & 0x01000100) - ((rb >> 8) & 0x00010001)) & 0x00FF00FF;


Can't take all the credit, I had a little AI help ;)

This was the place I spent couple of seconds to decode. It would be helpful to either have an example or description that 256-1 [(1<<8)-1 or 256-(256>>8)] is 255 which saturates (|=) the output.

DedeHai · 2025-09-14T07:12:46Z

@blazoncek thanks for the review. can you please point out where the code is unclear so I can add comments?

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

wled00/colors.cpp (2)
37-49: Overflow path math looks correct; note rounding behavior

The overflow detection via 0x01000100 and per‑lane rescale is sound. Due to integer division, the “preserveCR” path can yield a max of 254 for some inputs (not always 255). If that’s intended, add a brief note to avoid future “off‑by‑one” questions.

93-101: Early‑exit guard: prefer boolean check to avoid theoretical wrap

Sum works given 0–255 inputs, but boolean OR is clearer and avoids theoretical wrap if larger values ever slip through.
-  if (rgb == 0 || hueShift + lighten + brighten == 0) return rgb; // black or no change
+  if (rgb == 0 || (!hueShift && !lighten && !brighten)) return rgb; // black or no change

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa06732 and 2f5f76b.

📒 Files selected for processing (1)

wled00/colors.cpp (4 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

wled00/**/*.{cpp,h}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use spaces (2 per level) for all C++ source and header files

Files:

wled00/colors.cpp

🧠 Learnings (2)

📓 Common learnings

Learnt from: DedeHai
PR: wled/WLED#4798
File: wled00/FX.cpp:7531-7533
Timestamp: 2025-08-26T11:51:21.817Z
Learning: In WLED PR #4798, DedeHai confirmed that certain gamma-related calls in FX.cpp/FX_fcn.cpp/particle systems are intentional for effect-level shaping (e.g., brightness curves, TV sim, Pride 2015 pre-mix), distinct from final output gamma. Do not flag or remove these in future reviews; add comments when feasible to clarify intent.

Learnt from: mval-sg
PR: wled/WLED#4876
File: wled00/xml.cpp:0-0
Timestamp: 2025-08-28T08:09:20.630Z
Learning: The WLED codebase has opportunities for refactoring hardcoded array bounds (like the "15" used for DMX channels) to use sizeof(array)/sizeof(array[0]) for more maintainable code, but such changes should be done consistently across the entire codebase in a dedicated refactoring effort.

Learnt from: DedeHai
PR: wled/WLED#4939
File: wled00/FX_fcn.cpp:1176-1187
Timestamp: 2025-09-16T18:08:42.848Z
Learning: In WLED finalizeInit() bus creation (wled00/FX_fcn.cpp), intentionally allowing memory overruns when bus configurations exceed MAX_LED_MEMORY is a deliberate design choice. The trade-off prioritizes creating buses with reduced LED counts over completely failing to create buses, which would cause no LED output and UI failures. This approach forces users to update configurations after migrating to version 0.16 while maintaining basic functionality.

Learnt from: DedeHai
PR: wled/WLED#4923
File: wled00/FX.cpp:4883-4901
Timestamp: 2025-09-12T17:29:43.826Z
Learning: In WLED’s web UI, only one slider value (e.g., SEGMENT.intensity or SEGMENT.custom1) changes at a time; code relying on this may use simplified change guards, though presets/JSON can still update multiple fields atomically.

Learnt from: CR
PR: wled/WLED#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-20T09:00:08.933Z
Learning: Applies to wled00/html_*.h : DO NOT edit generated headers wled00/html_*.h

📚 Learning: 2025-08-26T11:51:21.817Z

Learnt from: DedeHai
PR: wled/WLED#4798
File: wled00/FX.cpp:7531-7533
Timestamp: 2025-08-26T11:51:21.817Z
Learning: In WLED PR #4798, DedeHai confirmed that certain gamma-related calls in FX.cpp/FX_fcn.cpp/particle systems are intentional for effect-level shaping (e.g., brightness curves, TV sim, Pride 2015 pre-mix), distinct from final output gamma. Do not flag or remove these in future reviews; add comments when feasible to clarify intent.

Applied to files:

wled00/colors.cpp

🧬 Code graph analysis (1)

wled00/colors.cpp (1)

wled00/colors.h (1)

rgb2hsv (132-132)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: wled_build / Build Environments (esp32_wrover)
GitHub Check: wled_build / Build Environments (esp32s3dev_8MB_opi)
GitHub Check: wled_build / Build Environments (esp32s3_4M_qspi)
GitHub Check: wled_build / Build Environments (esp32s3dev_16MB_opi)
GitHub Check: wled_build / Build Environments (esp32_eth)
GitHub Check: wled_build / Build Environments (usermods)
GitHub Check: wled_build / Build Environments (esp32c3dev)
GitHub Check: wled_build / Build Environments (nodemcuv2_compat)
GitHub Check: wled_build / Build Environments (lolin_s2_mini)
GitHub Check: wled_build / Build Environments (esp01_1m_full_160)
GitHub Check: wled_build / Build Environments (esp8266_2m_compat)
GitHub Check: wled_build / Build Environments (nodemcuv2_160)
GitHub Check: wled_build / Build Environments (esp32dev)
GitHub Check: wled_build / Build Environments (nodemcuv2)
GitHub Check: wled_build / Build Environments (esp8266_2m)

🔇 Additional comments (6)

wled00/colors.cpp (6)

11-21: color_blend: attribute placement LGTM; confirm IRAM footprint

Using WLED_O2_ATTR before IRAM_ATTR is fine. Keeping this hot path in IRAM seems intentional here. Please confirm IRAM headroom on ESP8266/ESP32 after other IRAM removals in this PR.

28-35: color_add O2 attribute: good placement

Signature/attrs look consistent with the stated goals. No functional concerns in this hunk.

51-56: Branchless per‑channel saturation: clear and efficient

Nice. The added examples make the trick self‑explanatory.

58-58: Return assembly RG|WG: LGTM

Composition matches lane positions; no issues.

598-603: NeoGammaWLEDMethod::Correct: IRAM removal OK

Consistent with PR’s IRAM reductions. Behavior unchanged; aligns with the learning that gamma shaping is intentional.

604-617: NeoGammaWLEDMethod::inverseGamma32: IRAM removal OK

Same as above. No functional changes.

wled00/colors.cpp

@blazoncek

- speed optimization in color_add, PS fast_color_add and blur functions - applying more bit and shift manipulation tricks to squeeze out just a bit more speed on color manipulation functions. - Optimization on blur is based on work by @blazoncek - Renamed PS fast_color_add() to fast_color_scaleAdd()

speed optimizations, fix for restoreColorLossy, code cleanup

3641db2

DedeHai requested a review from willmmiles August 31, 2025 17:00

This comment was marked as resolved.

Sign in to view

DedeHai requested a review from blazoncek August 31, 2025 20:30

This comment was marked as off-topic.

Sign in to view

blazoncek reviewed Sep 14, 2025

View reviewed changes

add branchless saturation example comments

2f5f76b

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

wled00/colors.cpp Show resolved Hide resolved

DedeHai merged commit 8b39757 into wled:main Sep 23, 2025
20 checks passed

coderabbitai bot mentioned this pull request Sep 28, 2025

Safety Checks for UI, fix for cfg exceeding LED limit #4939

Open

Uh oh!

speed optimisations, fix for restoreColorLossy, code cleanup #4895

speed optimisations, fix for restoreColorLossy, code cleanup #4895

Uh oh!

Conversation

DedeHai commented Aug 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as off-topic.

Uh oh!

DedeHai commented Sep 13, 2025

Uh oh!

blazoncek left a comment

Choose a reason for hiding this comment

Uh oh!

blazoncek Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

DedeHai Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

blazoncek Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

DedeHai Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blazoncek Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

blazoncek Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

DedeHai Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

blazoncek Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

DedeHai commented Sep 14, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DedeHai commented Aug 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 31, 2025 •

edited

Loading

DedeHai Sep 14, 2025 •

edited

Loading