-
Notifications
You must be signed in to change notification settings - Fork 839
Improving performance of FPTL algorithm by 0.3 ms on console. #5866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi! This comment will help you figure out which jobs to run before merging your PR. The suggestions are dynamic based on what files you have changed. HDRP Depending on the scope of your PR, you may need to run more jobs than what has been suggested. Please speak to your lead or a Graphics SDET (#devs-graphics-automation) if you are unsure. |
e3ca248
to
315aa26
Compare
|
||
//When using LDS to cache the volume data, this produces the best most optimal code. | ||
//Doing a manual loop like the one below adds an extra cost of .1 ms on ps4 if we use LDS. | ||
for (int l = 0; l < iNrCoarseLights; ++l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand why, but the wave compiler generates much better code when we do the loop naively on top of LDS.
The biggest win so far has been the uVal extra check in the PIXEL_PER_THREAD, not just the early out, but also the fact that it forces the loop to be dynamic, which is consistent with the loop below.
I tried having the loop below (the old one) wiht the dynamic inner loop + on top of LDS, and it generates slower code. So this is a win of .1 ms, on top of the .2 ms that we get from the dynamic inner loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check what gets generated for Xbox? FXC can lead to very different results than wave often
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked xbox. Neutral. Deleting the old code.
com.unity.render-pipelines.high-definition/Runtime/Lighting/LightLoop/lightlistbuild.compute
Outdated
Show resolved
Hide resolved
9006aad
to
13c579b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kecho did a quick check on Editor and Windows standalone with Spot(Cone, Pyramid, Box), Point, Area lights, there's no visual issues indeed, but doing player build throws "lightlistbuild" warnings which doesn't occur on master:
@TomasKiniulis great find Tomas! Will update the PR with a fixed version soon. |
@TomasKiniulis fix for compiler warning is now in. Thanks again. Let me know if you find anything else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @kecho! It's perfect now
* Adding FPTL caching of light volume, adding new conditional for early out of loop and forcing loop to be dynamic * Early out on the wave itself if we find at least 1 valid light, saves additional 0.05ms * Fixing some compiler warnings
159fc01
to
d9ecd42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement on wave and register count! It looks good to me and thank you for doing the extra perf tests we discussed too.
* master: (148 commits) [HDRP] Add custom pass buffer scaling functions (#5809) Fix HDRP template input not working when using the new Input System and no Keyboard/Mouse (#6045) [SRP] Bump package version to 13.2.0 (#6049) ** Improving FTPL perf on ps4 by .3 ms on average ** (#5866) Remove min version from package.json (#6044) Fix subdiv view (#6033) Small qol (#6036) APV: update some tooltips and add a clamp on dilation validity threshold (#6005) SRP bump to 13.1.1 (#6041) (SRP] Bump min version to a12 to fix Yamato Vfx/fix/1289612 filter texture by dimension (#5715) [HDRP] Fix 9601/9602 reference screenshots after cache server weirdness [CI] [trunk] Updated editor to 5a5aca0fb632e01b9b362f6deb73bcf599d612ca [CI] [trunk] Updated editor to 7b5b9bb6eed88e40de00efa2a629dd8f0b2bfee2 [CI] [trunk] Updated editor to a397ac6302d3ce68bd3eeea7721610a649addfa3 [CI] [trunk] Updated editor to dd9d77b7ded66b5edad4dacf123ffbb6c8d8c4bf [CI] [trunk] Updated editor to 6c7822fe613adfea64bb232c817a2fdee34fc273 [CI] [trunk] Updated editor to aae7fd02ff5afebc831948d25c52dcf704a8a3f3 [CI] [trunk] Updated editor to 9c278756e419ae931cabac6c5dd60f24e05c6de3 [CI] [trunk] Updated editor to d3dc7fc8d330da1155ec00683876a559b2a63281 ...
* Adding FPTL caching of light volume, adding new conditional for early out of loop and forcing loop to be dynamic * Early out on the wave itself if we find at least 1 valid light, saves additional 0.05ms * Fixing some compiler warnings
* APV: update some tooltips and add a clamp on dilation validity threshold (#6005) * Tooltip and dilation thresh clamp * More tooltip grammar * Small qol (#6036) * Fix subdiv view (#6033) * ** Improving FTPL perf on ps4 by .3 ms on average ** (#5866) * Adding FPTL caching of light volume, adding new conditional for early out of loop and forcing loop to be dynamic * Early out on the wave itself if we find at least 1 valid light, saves additional 0.05ms * Fixing some compiler warnings * Update to HDRP Asset analytics (#6060) * Updated HDRP analytics - New version of hdrp usage to better analyse data - Default values event to populate default values for the dashboard * Fixed menu item * Enable iris normal for Eye shader (#5880) * Enable Iris normal for Eye shader * categories * update eye sample Co-authored-by: sebastienlagarde <sebastien@unity3d.com> * [HDRP] Fix errors when switching build targets in editor #5918 * [HDRP] Change RenderGraph Begin/Execute function pattern to avoid leaks (#5929) * Fix render graph not being executed when an exception is thrown from the graph recording * Cleanup + doc * Fix iridescence tooltip (#5950) * Fix tooltip * Update Material-Type.md * Update iridescence-thickness.md * Update LitSurfaceInputsUIBlock.cs * Layer drawer used in ray/path tracing now matches 100% with camera's. (#5956) Please enter the commit message for your changes. Lines starting * [HDRP][Docs] Update docs with RendererList related option (#6031) * Update docs with RendererList related option * Minor edit * [HDRP][Path Tracing] Added proper support for interleaved tiling (#5953) * Added ortho cam support, plus raygen refactor. * Added support for interleaved tiling. * Added spread angle adjustment. * Offset tile sub-pixels, instead of relying on proj matrix modifications. * Undoed last commit. * Use tiled pixel coords for all things sampling-related (incl. lens). * Update CHANGELOG.md Co-authored-by: sebastienlagarde <sebastien@unity3d.com> * Renable missing test (Lens Flare) (#5456) * Renable missing test (Lens Flare) * Update references images for 4092 Co-authored-by: Sebastien Lagarde <sebastien@unity3d.com> * [HDRP][Path Tracing] Camera ray misses now return a null value with Minimum Depth > 1 #6067 * [HDRP][Path Tracing] Improved robustness of the stacklit material (#6066) * Improved robustness of the stacklit material. * Updated changelog. * Changed coat normal sample texture from default to normal * add 5007 stacklit test scene for PT * added scene to build settings Co-authored-by: Remi Chapelain <remi.chapelain@unity3d.com> Co-authored-by: sebastienlagarde <sebastien@unity3d.com> * Fixed grammar errors (#6077) * Fix division by 0 when AO is 0 (#6078) * [HDRP] Fix the injection point field not visible in custom pass volumes (#6084) * Fix custom pass injection point not visible when using the Camera mode. * updated changelog Co-authored-by: FrancescoC-unity <43168857+FrancescoC-unity@users.noreply.github.com> Co-authored-by: Kleber Garcia <kleber.garcia@unity3d.com> Co-authored-by: JulienIgnace-Unity <julien@unity3d.com> Co-authored-by: Adrien de Tocqueville <adrien.tocqueville@unity3d.com> Co-authored-by: Antoine Lelievre <antoinel@unity3d.com> Co-authored-by: Emmanuel Turquin <emmanuel@turquin.org> Co-authored-by: Pavlos Mavridis <pavlos.mavridis@unity3d.com> Co-authored-by: skhiat <55133890+skhiat@users.noreply.github.com> Co-authored-by: Remi Chapelain <remi.chapelain@unity3d.com> Co-authored-by: emilybrown1 <88374601+emilybrown1@users.noreply.github.com>
Purpose of this PR
This PR contains 2 micro optimizations:
1 - Internal loop of FPTL is now dynamic, and earlies out. This generates nicer assembly on consoles (ps4) and saves around 0.2 ms with just 20 lights. Savings are higher with more lights.
2- Throwing the volume and coarseIdx into LDS to reduce ALU cost and some bandwidth cost. Lights are only loaded once per wave, this saves around 0.1 ms
Gains:
Total savings .3ms.
Testing
This change should be completely safe and I expect no difference in visuals.
For QA: not sure if any further testing is needed, other than a sanity check with a few ligth types on a scene: we gotta make sure the tiles make sense and that we dont get any checkerboard artifacts (unless of course you exceed the light limit)
Before:

After:
