Skip to content

Conversation

DedeHai
Copy link
Collaborator

@DedeHai DedeHai commented Aug 31, 2025

A few tweaks to increase rendering speed just a bit more. On ESP32 this comes at a cost of ~500bytes of flash use, on C3 and ESP8266 it saves ~300bytes. The flash and speed numbers in the text below are for ESP32.

Edit: with the latest commit, code now also shrinks for ESP32 by 400 bytes.

  • Created WLED_O2_ATTR alias for `attribute((optimize("O2")))

  • Moved fastColorScale() function from PS to colors.h, changed it to use 32bit math and put it to good use wherever scaling accuracy does not matter too much, i.e. in Segment::fadeToBlackBy() and Segment::blur(), making these function much faster. I also made it an inline function as the function call overhead is using more flash than the function itself. Since no byte access is used in the function, this is safe for all color operations, even if in 32bit access RAM or at least I'd exptect so, did not test that but it will show soon enogh ;)

  • Added new _NPBbri variable to bus: need to track total brightness scaling (new ABL code) for NPB buffers or BusDigital::getPixelColor() will return incorrect color if ABL is engaged. Function is currently not used but if we remove the global _pixels[] buffer, the Copy FX will need it. I did not test this particular code change, someone please check if the logic holds (the math should be correct, I checked that).

  • color_blend():adding WLED_O2_ATTR gives a rendering speed improvement of about 1% at the cost of 100 bytes of flash.

  • Segment::setPixelColor(): removed IRAM_ATTR, using WLED_O2_ATTR instead: significant FPS improvement of 4% for the cost of 350bytes of flash.

  • Segment::getPixelColor() removed IRAM_ATTR, using WLED_O2_ATTR instead: faster in tests and even saves a bit of flash.

  • isPixelClipped() 1D & 2D version: removing IRAM_ATTR will highly likely just inline the function as it is called only once. A small flash use reduction seems to confirm that. This is faster than forcing a function call to IRAM. Since its only used during transitions I did not measure the speed impact and I would not expect it to be huge.

  • Get/setPixelColorX: instead of casting vWidth/vHeight to int, cast X and Y to unsigned and save the negative check. Makes code size smaller so I assume it is faster too. I also thought about making X/Y coordinated unsigned in general to get more consistency throughout the code but decided I will leave that to a futuer endeavour.

Code cleanup:

  • Changed parameter order in settings_leds.htm to match cpp file as it was confusing, no functional change.
  • Bugfix in adjust_color() and fixed indentation
  • Removed IRAM_ATTR_YN from unused gamma functions (just in case that attribute forces the compiler to put them there)

In summary: this PR and in combination with #4889 there was a significant improvement in FPS in my test. From 68FPS to 78FPS (4 Layers, 32x32 on ESP32), each PR contributes roughly half of that improvement. On a more general test on the C3 the gain was less significant but still visible (+2FPS at 50FPS just this PR alone).

Summary by CodeRabbit

  • Bug Fixes
    • More accurate colors when brightness limiting (ABL) is active.
    • Corrected color adjustment to avoid unintended shifts.
    • Improved bounds checking to prevent artifacts from out-of-range pixels.
  • Refactor
    • Faster, more robust blur and color-scaling for smoother visuals and improved rendering performance.
    • Reduced overhead in particle and rendering paths for more consistent effects.
  • Style
    • Minor formatting cleanup on the LED settings page (no functional changes).

@DedeHai DedeHai requested a review from willmmiles August 31, 2025 17:00
Copy link
Contributor

coderabbitai bot commented Aug 31, 2025

Walkthrough

Refactors color-scaling/fade helpers and function attributes, introduces WLED_O2_ATTR, hardens 2D XY bounds checks, rewrites 1D/2D blur logic to use fast scaling and carry/part schemes, adds per-bus applied-brightness tracking for ABL, and makes a formatting-only HTML edit.

Changes

Cohort / File(s) Summary of changes
Optimization attribute unification
wled00/const.h, wled00/FX_fcn.cpp, wled00/FXparticleSystem.cpp, wled00/colors.cpp
Added WLED_O2_ATTR macro and applied it to selected functions; removed or replaced prior IRAM_ATTR_YN/attribute annotations and standardized per-function optimization attributes.
Color scaling / fade refactor
wled00/colors.h, wled00/colors.cpp, wled00/FX_fcn.cpp, wled00/FX_2Dfcn.cpp, wled00/FXparticleSystem.cpp
Introduced fast_color_scale(...) (static inline) and moved color_fade into NeoGammaWLEDMethod; replaced many color_fade/scale/add call sites with fast_color_scale or new scale-add variants and adjusted related color math.
FX 1D/2D blur and pixel access updates
wled00/FX_fcn.cpp, wled00/FX_2Dfcn.cpp
Rewrote blur loops to use fast scaling and a carry/part scheme; removed IRAM_ATTR_YN from several Segment pixel methods; hardened XY bounds checks via unsigned casts to treat negative coords as OOB.
Particle system color ops & attributes
wled00/FXparticleSystem.cpp
Added fast_color_scaleAdd(...) (saturating 32-bit scale+add), removed older helpers, migrated render/blur/collision call sites to new helpers; switched rendering/collision methods to WLED_O2_ATTR.
Bus manager ABL tracking
wled00/bus_manager.h, wled00/bus_manager.cpp
Added protected _NPBbri member and moved _autoWhiteMode out of bitfield; track/update _NPBbri in applyBriLimit and show, and use it in getPixelColor for color restoration under ABL.
Colors / gamma adjustments
wled00/colors.cpp, wled00/colors.h
Added WLED_O2_ATTR to color_blend/color_add; removed IRAM_ATTR_YN wrappers from NeoGammaWLEDMethod methods; adjusted adjust_color logic and added overflow-aware path in color_add.
HTML formatting only
wled00/data/settings_leds.htm
Reordered local variable declarations in bLimits; no behavioral change.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • willmmiles
  • blazoncek
  • netmindz

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the PR's main goals—performance optimizations, a specific bugfix for restoreColorLossy, and general code cleanup—which match the changes and objectives in the PR description and file summaries; it is concise and relevant for a reviewer scanning history. It avoids vague wording and directly references the primary impact areas of the changeset.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

@DedeHai

This comment was marked as resolved.

This comment was marked as resolved.

@DedeHai DedeHai requested a review from blazoncek August 31, 2025 20:30
applying more bit and shift manipulation tricks to squeeze out just a bit more speed on color manipulation functions.
- Optimization on blur is based on work by @blazoncek
- Renamed PS fast_color_add() to fast_color_scaleAdd()
In my arbitrary speed tests thse changes resulted in 2-3% higher FPS.
coderabbitai[bot]

This comment was marked as off-topic.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Sep 13, 2025

I am guessing it's hard to review this properly as there are many details I worked hours on an they are a bit hard to grasp without spending much time studying the code in detail. I tested each single change for speed and corect operations. While I cannot guarantee there are no bugs introduced this is generally safe to merge.

Copy link
Collaborator

@blazoncek blazoncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any speedup is welcome, however code readabilty suffers without comments what the code does.

{
if (!isActive()) return; // not active
if (x >= (int)vWidth() || y >= (int)vHeight() || x < 0 || y < 0) return; // if pixel would fall out of virtual segment just exit
if ((unsigned)x >= vWidth() || (unsigned)y >= vHeight()) return; // if pixel would fall out of virtual segment just exit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you go all the way and change every int into unsigned?

It would save you a few manual conversions like this and the code would be cleaner.

However, do check if any of the effects produce negative index/coordinates to avoid other errors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Get/setPixelColorX: instead of casting vWidth/vHeight to int, cast X and Y to unsigned and save the negative check. Makes code size smaller so I assume it is faster too. I also thought about making X/Y coordinated unsigned in general to get more consistency throughout the code but decided I will leave that to a futuer endeavour."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int was introduced to combat issues @softhack007 noticed with a few SR effects (possibly others too) that produced negative indexes.
Originally indexes were uint16_t.

Copy link
Collaborator Author

@DedeHai DedeHai Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1D uses int for "vitual strips" IIRC, I wanted to convert 2D to unsigned before but struggled with "ambiguous" function calls I was not sure how to resolve so I abandoned it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1D uses int for "vitual strips"

not int actually. it encodes virtual strip number into upper 16 bits. which means it can produce incorrect results.

w = w > 255 ? 255 : w;
return RGBW32(r,g,b,w);
// branchless per-channel saturation to 255 (extract 9th bit, subtract 1 if it is set, mask with 0xFF)
rb |= ((rb & 0x01000100) - ((rb >> 8) & 0x00010001)) & 0x00FF00FF;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't take all the credit, I had a little AI help ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the place I spent couple of seconds to decode. It would be helpful to either have an example or description that 256-1 [(1<<8)-1 or 256-(256>>8)] is 255 which saturates (|=) the output.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Sep 14, 2025

@blazoncek thanks for the review. can you please point out where the code is unclear so I can add comments?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
wled00/colors.cpp (2)

37-49: Overflow path math looks correct; note rounding behavior

The overflow detection via 0x01000100 and per‑lane rescale is sound. Due to integer division, the “preserveCR” path can yield a max of 254 for some inputs (not always 255). If that’s intended, add a brief note to avoid future “off‑by‑one” questions.


93-101: Early‑exit guard: prefer boolean check to avoid theoretical wrap

Sum works given 0–255 inputs, but boolean OR is clearer and avoids theoretical wrap if larger values ever slip through.

-  if (rgb == 0 || hueShift + lighten + brighten == 0) return rgb; // black or no change
+  if (rgb == 0 || (!hueShift && !lighten && !brighten)) return rgb; // black or no change
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa06732 and 2f5f76b.

📒 Files selected for processing (1)
  • wled00/colors.cpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
wled00/**/*.{cpp,h}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use spaces (2 per level) for all C++ source and header files

Files:

  • wled00/colors.cpp
🧠 Learnings (2)
📓 Common learnings
Learnt from: DedeHai
PR: wled/WLED#4798
File: wled00/FX.cpp:7531-7533
Timestamp: 2025-08-26T11:51:21.817Z
Learning: In WLED PR #4798, DedeHai confirmed that certain gamma-related calls in FX.cpp/FX_fcn.cpp/particle systems are intentional for effect-level shaping (e.g., brightness curves, TV sim, Pride 2015 pre-mix), distinct from final output gamma. Do not flag or remove these in future reviews; add comments when feasible to clarify intent.
Learnt from: mval-sg
PR: wled/WLED#4876
File: wled00/xml.cpp:0-0
Timestamp: 2025-08-28T08:09:20.630Z
Learning: The WLED codebase has opportunities for refactoring hardcoded array bounds (like the "15" used for DMX channels) to use sizeof(array)/sizeof(array[0]) for more maintainable code, but such changes should be done consistently across the entire codebase in a dedicated refactoring effort.
Learnt from: DedeHai
PR: wled/WLED#4939
File: wled00/FX_fcn.cpp:1176-1187
Timestamp: 2025-09-16T18:08:42.848Z
Learning: In WLED finalizeInit() bus creation (wled00/FX_fcn.cpp), intentionally allowing memory overruns when bus configurations exceed MAX_LED_MEMORY is a deliberate design choice. The trade-off prioritizes creating buses with reduced LED counts over completely failing to create buses, which would cause no LED output and UI failures. This approach forces users to update configurations after migrating to version 0.16 while maintaining basic functionality.
Learnt from: DedeHai
PR: wled/WLED#4923
File: wled00/FX.cpp:4883-4901
Timestamp: 2025-09-12T17:29:43.826Z
Learning: In WLED’s web UI, only one slider value (e.g., SEGMENT.intensity or SEGMENT.custom1) changes at a time; code relying on this may use simplified change guards, though presets/JSON can still update multiple fields atomically.
Learnt from: CR
PR: wled/WLED#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-20T09:00:08.933Z
Learning: Applies to wled00/html_*.h : DO NOT edit generated headers wled00/html_*.h
📚 Learning: 2025-08-26T11:51:21.817Z
Learnt from: DedeHai
PR: wled/WLED#4798
File: wled00/FX.cpp:7531-7533
Timestamp: 2025-08-26T11:51:21.817Z
Learning: In WLED PR #4798, DedeHai confirmed that certain gamma-related calls in FX.cpp/FX_fcn.cpp/particle systems are intentional for effect-level shaping (e.g., brightness curves, TV sim, Pride 2015 pre-mix), distinct from final output gamma. Do not flag or remove these in future reviews; add comments when feasible to clarify intent.

Applied to files:

  • wled00/colors.cpp
🧬 Code graph analysis (1)
wled00/colors.cpp (1)
wled00/colors.h (1)
  • rgb2hsv (132-132)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: wled_build / Build Environments (esp32_wrover)
  • GitHub Check: wled_build / Build Environments (esp32s3dev_8MB_opi)
  • GitHub Check: wled_build / Build Environments (esp32s3_4M_qspi)
  • GitHub Check: wled_build / Build Environments (esp32s3dev_16MB_opi)
  • GitHub Check: wled_build / Build Environments (esp32_eth)
  • GitHub Check: wled_build / Build Environments (usermods)
  • GitHub Check: wled_build / Build Environments (esp32c3dev)
  • GitHub Check: wled_build / Build Environments (nodemcuv2_compat)
  • GitHub Check: wled_build / Build Environments (lolin_s2_mini)
  • GitHub Check: wled_build / Build Environments (esp01_1m_full_160)
  • GitHub Check: wled_build / Build Environments (esp8266_2m_compat)
  • GitHub Check: wled_build / Build Environments (nodemcuv2_160)
  • GitHub Check: wled_build / Build Environments (esp32dev)
  • GitHub Check: wled_build / Build Environments (nodemcuv2)
  • GitHub Check: wled_build / Build Environments (esp8266_2m)
🔇 Additional comments (6)
wled00/colors.cpp (6)

11-21: color_blend: attribute placement LGTM; confirm IRAM footprint

Using WLED_O2_ATTR before IRAM_ATTR is fine. Keeping this hot path in IRAM seems intentional here. Please confirm IRAM headroom on ESP8266/ESP32 after other IRAM removals in this PR.


28-35: color_add O2 attribute: good placement

Signature/attrs look consistent with the stated goals. No functional concerns in this hunk.


51-56: Branchless per‑channel saturation: clear and efficient

Nice. The added examples make the trick self‑explanatory.


58-58: Return assembly RG|WG: LGTM

Composition matches lane positions; no issues.


598-603: NeoGammaWLEDMethod::Correct: IRAM removal OK

Consistent with PR’s IRAM reductions. Behavior unchanged; aligns with the learning that gamma shaping is intentional.


604-617: NeoGammaWLEDMethod::inverseGamma32: IRAM removal OK

Same as above. No functional changes.

@DedeHai DedeHai merged commit 8b39757 into wled:main Sep 23, 2025
20 checks passed
netmindz pushed a commit to netmindz/WLED-MM that referenced this pull request Sep 23, 2025
- speed optimization in color_add, PS fast_color_add and blur functions
- applying more bit and shift manipulation tricks to squeeze out just a bit more speed on color manipulation functions.
- Optimization on blur is based on work by @blazoncek
- Renamed PS fast_color_add() to fast_color_scaleAdd()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants