-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
speed optimisations, fix for restoreColorLossy, code cleanup #4895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors color-scaling/fade helpers and function attributes, introduces WLED_O2_ATTR, hardens 2D XY bounds checks, rewrites 1D/2D blur logic to use fast scaling and carry/part schemes, adds per-bus applied-brightness tracking for ABL, and makes a formatting-only HTML edit. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
applying more bit and shift manipulation tricks to squeeze out just a bit more speed on color manipulation functions. - Optimization on blur is based on work by @blazoncek - Renamed PS fast_color_add() to fast_color_scaleAdd() In my arbitrary speed tests thse changes resulted in 2-3% higher FPS.
I am guessing it's hard to review this properly as there are many details I worked hours on an they are a bit hard to grasp without spending much time studying the code in detail. I tested each single change for speed and corect operations. While I cannot guarantee there are no bugs introduced this is generally safe to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any speedup is welcome, however code readabilty suffers without comments what the code does.
{ | ||
if (!isActive()) return; // not active | ||
if (x >= (int)vWidth() || y >= (int)vHeight() || x < 0 || y < 0) return; // if pixel would fall out of virtual segment just exit | ||
if ((unsigned)x >= vWidth() || (unsigned)y >= vHeight()) return; // if pixel would fall out of virtual segment just exit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you go all the way and change every int
into unsigned
?
It would save you a few manual conversions like this and the code would be cleaner.
However, do check if any of the effects produce negative index/coordinates to avoid other errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Get/setPixelColorX: instead of casting vWidth/vHeight to int, cast X and Y to unsigned and save the negative check. Makes code size smaller so I assume it is faster too. I also thought about making X/Y coordinated unsigned in general to get more consistency throughout the code but decided I will leave that to a futuer endeavour."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int
was introduced to combat issues @softhack007 noticed with a few SR effects (possibly others too) that produced negative indexes.
Originally indexes were uint16_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1D uses int for "vitual strips" IIRC, I wanted to convert 2D to unsigned before but struggled with "ambiguous" function calls I was not sure how to resolve so I abandoned it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1D uses int for "vitual strips"
not int
actually. it encodes virtual strip number into upper 16 bits. which means it can produce incorrect results.
w = w > 255 ? 255 : w; | ||
return RGBW32(r,g,b,w); | ||
// branchless per-channel saturation to 255 (extract 9th bit, subtract 1 if it is set, mask with 0xFF) | ||
rb |= ((rb & 0x01000100) - ((rb >> 8) & 0x00010001)) & 0x00FF00FF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't take all the credit, I had a little AI help ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the place I spent couple of seconds to decode. It would be helpful to either have an example or description that 256-1 [(1<<8)-1
or 256-(256>>8)
] is 255 which saturates (|=
) the output.
@blazoncek thanks for the review. can you please point out where the code is unclear so I can add comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
wled00/colors.cpp (2)
37-49
: Overflow path math looks correct; note rounding behaviorThe overflow detection via 0x01000100 and per‑lane rescale is sound. Due to integer division, the “preserveCR” path can yield a max of 254 for some inputs (not always 255). If that’s intended, add a brief note to avoid future “off‑by‑one” questions.
93-101
: Early‑exit guard: prefer boolean check to avoid theoretical wrapSum works given 0–255 inputs, but boolean OR is clearer and avoids theoretical wrap if larger values ever slip through.
- if (rgb == 0 || hueShift + lighten + brighten == 0) return rgb; // black or no change + if (rgb == 0 || (!hueShift && !lighten && !brighten)) return rgb; // black or no change
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wled00/colors.cpp
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
wled00/**/*.{cpp,h}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use spaces (2 per level) for all C++ source and header files
Files:
wled00/colors.cpp
🧠 Learnings (2)
📓 Common learnings
Learnt from: DedeHai
PR: wled/WLED#4798
File: wled00/FX.cpp:7531-7533
Timestamp: 2025-08-26T11:51:21.817Z
Learning: In WLED PR #4798, DedeHai confirmed that certain gamma-related calls in FX.cpp/FX_fcn.cpp/particle systems are intentional for effect-level shaping (e.g., brightness curves, TV sim, Pride 2015 pre-mix), distinct from final output gamma. Do not flag or remove these in future reviews; add comments when feasible to clarify intent.
Learnt from: mval-sg
PR: wled/WLED#4876
File: wled00/xml.cpp:0-0
Timestamp: 2025-08-28T08:09:20.630Z
Learning: The WLED codebase has opportunities for refactoring hardcoded array bounds (like the "15" used for DMX channels) to use sizeof(array)/sizeof(array[0]) for more maintainable code, but such changes should be done consistently across the entire codebase in a dedicated refactoring effort.
Learnt from: DedeHai
PR: wled/WLED#4939
File: wled00/FX_fcn.cpp:1176-1187
Timestamp: 2025-09-16T18:08:42.848Z
Learning: In WLED finalizeInit() bus creation (wled00/FX_fcn.cpp), intentionally allowing memory overruns when bus configurations exceed MAX_LED_MEMORY is a deliberate design choice. The trade-off prioritizes creating buses with reduced LED counts over completely failing to create buses, which would cause no LED output and UI failures. This approach forces users to update configurations after migrating to version 0.16 while maintaining basic functionality.
Learnt from: DedeHai
PR: wled/WLED#4923
File: wled00/FX.cpp:4883-4901
Timestamp: 2025-09-12T17:29:43.826Z
Learning: In WLED’s web UI, only one slider value (e.g., SEGMENT.intensity or SEGMENT.custom1) changes at a time; code relying on this may use simplified change guards, though presets/JSON can still update multiple fields atomically.
Learnt from: CR
PR: wled/WLED#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-20T09:00:08.933Z
Learning: Applies to wled00/html_*.h : DO NOT edit generated headers wled00/html_*.h
📚 Learning: 2025-08-26T11:51:21.817Z
Learnt from: DedeHai
PR: wled/WLED#4798
File: wled00/FX.cpp:7531-7533
Timestamp: 2025-08-26T11:51:21.817Z
Learning: In WLED PR #4798, DedeHai confirmed that certain gamma-related calls in FX.cpp/FX_fcn.cpp/particle systems are intentional for effect-level shaping (e.g., brightness curves, TV sim, Pride 2015 pre-mix), distinct from final output gamma. Do not flag or remove these in future reviews; add comments when feasible to clarify intent.
Applied to files:
wled00/colors.cpp
🧬 Code graph analysis (1)
wled00/colors.cpp (1)
wled00/colors.h (1)
rgb2hsv
(132-132)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: wled_build / Build Environments (esp32_wrover)
- GitHub Check: wled_build / Build Environments (esp32s3dev_8MB_opi)
- GitHub Check: wled_build / Build Environments (esp32s3_4M_qspi)
- GitHub Check: wled_build / Build Environments (esp32s3dev_16MB_opi)
- GitHub Check: wled_build / Build Environments (esp32_eth)
- GitHub Check: wled_build / Build Environments (usermods)
- GitHub Check: wled_build / Build Environments (esp32c3dev)
- GitHub Check: wled_build / Build Environments (nodemcuv2_compat)
- GitHub Check: wled_build / Build Environments (lolin_s2_mini)
- GitHub Check: wled_build / Build Environments (esp01_1m_full_160)
- GitHub Check: wled_build / Build Environments (esp8266_2m_compat)
- GitHub Check: wled_build / Build Environments (nodemcuv2_160)
- GitHub Check: wled_build / Build Environments (esp32dev)
- GitHub Check: wled_build / Build Environments (nodemcuv2)
- GitHub Check: wled_build / Build Environments (esp8266_2m)
🔇 Additional comments (6)
wled00/colors.cpp (6)
11-21
: color_blend: attribute placement LGTM; confirm IRAM footprintUsing WLED_O2_ATTR before IRAM_ATTR is fine. Keeping this hot path in IRAM seems intentional here. Please confirm IRAM headroom on ESP8266/ESP32 after other IRAM removals in this PR.
28-35
: color_add O2 attribute: good placementSignature/attrs look consistent with the stated goals. No functional concerns in this hunk.
51-56
: Branchless per‑channel saturation: clear and efficientNice. The added examples make the trick self‑explanatory.
58-58
: Return assembly RG|WG: LGTMComposition matches lane positions; no issues.
598-603
: NeoGammaWLEDMethod::Correct: IRAM removal OKConsistent with PR’s IRAM reductions. Behavior unchanged; aligns with the learning that gamma shaping is intentional.
604-617
: NeoGammaWLEDMethod::inverseGamma32: IRAM removal OKSame as above. No functional changes.
- speed optimization in color_add, PS fast_color_add and blur functions - applying more bit and shift manipulation tricks to squeeze out just a bit more speed on color manipulation functions. - Optimization on blur is based on work by @blazoncek - Renamed PS fast_color_add() to fast_color_scaleAdd()
A few tweaks to increase rendering speed just a bit more. On ESP32 this comes at a cost of ~500bytes of flash use, on C3 and ESP8266 it saves ~300bytes. The flash and speed numbers in the text below are for ESP32.
Edit: with the latest commit, code now also shrinks for ESP32 by 400 bytes.
Created WLED_O2_ATTR alias for `attribute((optimize("O2")))
Moved
fastColorScale()
function from PS to colors.h, changed it to use 32bit math and put it to good use wherever scaling accuracy does not matter too much, i.e. inSegment::fadeToBlackBy()
andSegment::blur()
, making these function much faster. I also made it an inline function as the function call overhead is using more flash than the function itself. Since no byte access is used in the function, this is safe for all color operations, even if in 32bit access RAM or at least I'd exptect so, did not test that but it will show soon enogh ;)Added new
_NPBbri
variable to bus: need to track total brightness scaling (new ABL code) for NPB buffers orBusDigital::getPixelColor()
will return incorrect color if ABL is engaged. Function is currently not used but if we remove the global_pixels[]
buffer, the Copy FX will need it. I did not test this particular code change, someone please check if the logic holds (the math should be correct, I checked that).color_blend()
:adding WLED_O2_ATTR gives a rendering speed improvement of about 1% at the cost of 100 bytes of flash.Segment::setPixelColor()
: removed IRAM_ATTR, using WLED_O2_ATTR instead: significant FPS improvement of 4% for the cost of 350bytes of flash.Segment::getPixelColor()
removed IRAM_ATTR, using WLED_O2_ATTR instead: faster in tests and even saves a bit of flash.isPixelClipped()
1D & 2D version: removing IRAM_ATTR will highly likely just inline the function as it is called only once. A small flash use reduction seems to confirm that. This is faster than forcing a function call to IRAM. Since its only used during transitions I did not measure the speed impact and I would not expect it to be huge.Get/setPixelColorX
: instead of castingvWidth/vHeight
to int, cast X and Y to unsigned and save the negative check. Makes code size smaller so I assume it is faster too. I also thought about making X/Y coordinated unsigned in general to get more consistency throughout the code but decided I will leave that to a futuer endeavour.Code cleanup:
adjust_color()
and fixed indentationIn summary: this PR and in combination with #4889 there was a significant improvement in FPS in my test. From 68FPS to 78FPS (4 Layers, 32x32 on ESP32), each PR contributes roughly half of that improvement. On a more general test on the C3 the gain was less significant but still visible (+2FPS at 50FPS just this PR alone).
Summary by CodeRabbit