Skip to content

Improving performance of FPTL algorithm by 0.3 ms on console. #5866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 18, 2021

Conversation

kecho
Copy link
Contributor

@kecho kecho commented Sep 30, 2021

Purpose of this PR

This PR contains 2 micro optimizations:
1 - Internal loop of FPTL is now dynamic, and earlies out. This generates nicer assembly on consoles (ps4) and saves around 0.2 ms with just 20 lights. Savings are higher with more lights.

2- Throwing the volume and coarseIdx into LDS to reduce ALU cost and some bandwidth cost. Lights are only loaded once per wave, this saves around 0.1 ms

Gains:

  • Reduced VALU cost by 5%
  • Reduced total memory access
  • Increased occupancy of FPTL by 2 waves

Total savings .3ms.

Testing

This change should be completely safe and I expect no difference in visuals.

For QA: not sure if any further testing is needed, other than a sanity check with a few ligth types on a scene: we gotta make sure the tiles make sense and that we dont get any checkerboard artifacts (unless of course you exceed the light limit)

Before:
image

After:
image

@github-actions
Copy link

Hi! This comment will help you figure out which jobs to run before merging your PR. The suggestions are dynamic based on what files you have changed.
Link to Yamato: https://yamato.cds.internal.unity3d.com/jobs/902-Graphics
Search for your PR branch using the sidebar on the left, then add the following segment(s) to the end of the URL (you may need multiple tabs depending on how many packages you change)

HDRP
/.yamato%252Fall-hdrp.yml%2523PR_HDRP_trunk
With changes to HDRP packages, you should also run
/.yamato%252Fall-lightmapper.yml%2523PR_LightMapper_trunk

Depending on the scope of your PR, you may need to run more jobs than what has been suggested. Please speak to your lead or a Graphics SDET (#devs-graphics-automation) if you are unsure.

@github-actions github-actions bot added the HDRP label Sep 30, 2021
@kecho kecho force-pushed the HDRP/PerfImprovementFPTL branch 2 times, most recently from e3ca248 to 315aa26 Compare September 30, 2021 20:35

//When using LDS to cache the volume data, this produces the best most optimal code.
//Doing a manual loop like the one below adds an extra cost of .1 ms on ps4 if we use LDS.
for (int l = 0; l < iNrCoarseLights; ++l)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand why, but the wave compiler generates much better code when we do the loop naively on top of LDS.
The biggest win so far has been the uVal extra check in the PIXEL_PER_THREAD, not just the early out, but also the fact that it forces the loop to be dynamic, which is consistent with the loop below.
I tried having the loop below (the old one) wiht the dynamic inner loop + on top of LDS, and it generates slower code. So this is a win of .1 ms, on top of the .2 ms that we get from the dynamic inner loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check what gets generated for Xbox? FXC can lead to very different results than wave often

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked xbox. Neutral. Deleting the old code.

@kecho kecho removed the request for review from JulienIgnace-Unity September 30, 2021 20:44
@kecho kecho force-pushed the HDRP/PerfImprovementFPTL branch from 9006aad to 13c579b Compare October 1, 2021 19:10
@kecho kecho marked this pull request as ready for review October 1, 2021 19:10
@kecho kecho requested review from a team and removed request for a team October 11, 2021 15:15
Copy link
Contributor

@TomasKiniulis TomasKiniulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kecho did a quick check on Editor and Windows standalone with Spot(Cone, Pyramid, Box), Point, Area lights, there's no visual issues indeed, but doing player build throws "lightlistbuild" warnings which doesn't occur on master:
image (16)

@kecho
Copy link
Contributor Author

kecho commented Oct 14, 2021

@TomasKiniulis great find Tomas! Will update the PR with a fixed version soon.

@kecho
Copy link
Contributor Author

kecho commented Oct 14, 2021

@TomasKiniulis fix for compiler warning is now in. Thanks again. Let me know if you find anything else.

@kecho kecho requested a review from TomasKiniulis October 14, 2021 17:46
Copy link
Contributor

@TomasKiniulis TomasKiniulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @kecho! It's perfect now

* Adding FPTL caching of light volume, adding new conditional for early out of loop and forcing loop to be dynamic

* Early out on the wave itself if we find at least 1 valid light, saves additional 0.05ms

* Fixing some compiler warnings
@kecho kecho force-pushed the HDRP/PerfImprovementFPTL branch from 159fc01 to d9ecd42 Compare October 15, 2021 16:35
Copy link
Contributor

@mmikk mmikk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement on wave and register count! It looks good to me and thank you for doing the extra perf tests we discussed too.

@sebastienlagarde sebastienlagarde merged commit f128770 into master Oct 18, 2021
@sebastienlagarde sebastienlagarde deleted the HDRP/PerfImprovementFPTL branch October 18, 2021 12:16
odbb added a commit that referenced this pull request Oct 18, 2021
* master: (148 commits)
  [HDRP] Add custom pass buffer scaling functions (#5809)
  Fix HDRP template input not working when using the new Input System and no Keyboard/Mouse (#6045)
  [SRP] Bump package version to 13.2.0 (#6049)
  ** Improving FTPL perf on ps4 by .3 ms on average ** (#5866)
  Remove min version from package.json (#6044)
  Fix subdiv view (#6033)
  Small qol (#6036)
  APV: update some tooltips and add a clamp on dilation validity threshold (#6005)
  SRP bump to 13.1.1 (#6041)
  (SRP] Bump min version to a12 to fix Yamato
  Vfx/fix/1289612 filter texture by dimension (#5715)
  [HDRP] Fix 9601/9602 reference screenshots after cache server weirdness
  [CI] [trunk] Updated editor to 5a5aca0fb632e01b9b362f6deb73bcf599d612ca
  [CI] [trunk] Updated editor to 7b5b9bb6eed88e40de00efa2a629dd8f0b2bfee2
  [CI] [trunk] Updated editor to a397ac6302d3ce68bd3eeea7721610a649addfa3
  [CI] [trunk] Updated editor to dd9d77b7ded66b5edad4dacf123ffbb6c8d8c4bf
  [CI] [trunk] Updated editor to 6c7822fe613adfea64bb232c817a2fdee34fc273
  [CI] [trunk] Updated editor to aae7fd02ff5afebc831948d25c52dcf704a8a3f3
  [CI] [trunk] Updated editor to 9c278756e419ae931cabac6c5dd60f24e05c6de3
  [CI] [trunk] Updated editor to d3dc7fc8d330da1155ec00683876a559b2a63281
  ...
sebastienlagarde pushed a commit that referenced this pull request Oct 20, 2021
* Adding FPTL caching of light volume, adding new conditional for early out of loop and forcing loop to be dynamic

* Early out on the wave itself if we find at least 1 valid light, saves additional 0.05ms

* Fixing some compiler warnings
sebastienlagarde added a commit that referenced this pull request Oct 21, 2021
* APV: update some tooltips and add a clamp on dilation validity threshold (#6005)

* Tooltip and dilation thresh clamp

* More tooltip grammar

* Small qol (#6036)

* Fix subdiv view (#6033)

* ** Improving FTPL perf on ps4 by .3 ms on average ** (#5866)

* Adding FPTL caching of light volume, adding new conditional for early out of loop and forcing loop to be dynamic

* Early out on the wave itself if we find at least 1 valid light, saves additional 0.05ms

* Fixing some compiler warnings

* Update to HDRP Asset analytics (#6060)

* Updated HDRP analytics

- New version of hdrp usage to better analyse data
- Default values event to populate default values for the dashboard

* Fixed menu item

* Enable iris normal for Eye shader (#5880)

* Enable Iris normal for Eye shader

* categories

* update eye sample

Co-authored-by: sebastienlagarde <sebastien@unity3d.com>

* [HDRP] Fix errors when switching build targets in editor #5918

* [HDRP] Change RenderGraph Begin/Execute function pattern to avoid leaks (#5929)

* Fix render graph not being executed when an exception is thrown from the graph recording

* Cleanup + doc

* Fix iridescence tooltip (#5950)

* Fix tooltip

* Update Material-Type.md

* Update iridescence-thickness.md

* Update LitSurfaceInputsUIBlock.cs

* Layer drawer used in ray/path tracing now matches 100% with camera's. (#5956)

Please enter the commit message for your changes. Lines starting

* [HDRP][Docs] Update docs with RendererList related option (#6031)

* Update docs with RendererList related option

* Minor edit

* [HDRP][Path Tracing] Added proper support for interleaved tiling (#5953)

* Added ortho cam support, plus raygen refactor.

* Added support for interleaved tiling.

* Added spread angle adjustment.

* Offset tile sub-pixels, instead of relying on proj matrix modifications.

* Undoed last commit.

* Use tiled pixel coords for all things sampling-related (incl. lens).

* Update CHANGELOG.md

Co-authored-by: sebastienlagarde <sebastien@unity3d.com>

* Renable missing test (Lens Flare) (#5456)

* Renable missing test (Lens Flare)

* Update references images for 4092

Co-authored-by: Sebastien Lagarde <sebastien@unity3d.com>

* [HDRP][Path Tracing] Camera ray misses now return a null value with Minimum Depth > 1 #6067

* [HDRP][Path Tracing] Improved robustness of the stacklit material (#6066)

* Improved robustness of the stacklit material.

* Updated changelog.

* Changed coat normal sample texture from default to normal

* add 5007 stacklit test scene for PT

* added scene to build settings

Co-authored-by: Remi Chapelain <remi.chapelain@unity3d.com>
Co-authored-by: sebastienlagarde <sebastien@unity3d.com>

* Fixed grammar errors (#6077)

* Fix division by 0 when AO is 0 (#6078)

* [HDRP] Fix the injection point field not visible in custom pass volumes (#6084)

* Fix custom pass injection point not visible when using the Camera mode.

* updated changelog

Co-authored-by: FrancescoC-unity <43168857+FrancescoC-unity@users.noreply.github.com>
Co-authored-by: Kleber Garcia <kleber.garcia@unity3d.com>
Co-authored-by: JulienIgnace-Unity <julien@unity3d.com>
Co-authored-by: Adrien de Tocqueville <adrien.tocqueville@unity3d.com>
Co-authored-by: Antoine Lelievre <antoinel@unity3d.com>
Co-authored-by: Emmanuel Turquin <emmanuel@turquin.org>
Co-authored-by: Pavlos Mavridis <pavlos.mavridis@unity3d.com>
Co-authored-by: skhiat <55133890+skhiat@users.noreply.github.com>
Co-authored-by: Remi Chapelain <remi.chapelain@unity3d.com>
Co-authored-by: emilybrown1 <88374601+emilybrown1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants