Skip to content

Optimizing PrepareLightsForGPU (ProcessVisibleLights) with burst #5294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 14, 2021

Conversation

kecho
Copy link
Contributor

@kecho kecho commented Aug 3, 2021

Purpose of this PR

Optimization of PrepareLightsforGPU (ProcessVisibleLights) on HDRP using burst, in a ECS friendly way.

This PR includes:

  • Introduction of HDLightEntityCollection - which is a master singleton render side entity manager.
  • Replacing the flat ProcessVisibleLights loop with a parallel job, using the HDVisibleLightEntities helper.
  • Reduction of dependencies to HDAdditionalLights on the ProcessVisibleLight loop
  • Burstification / parallelization of ProcessVisibleLights
  • New sorting functions, performance oriented. There are now 3 differnet sorting algorithms. Insertion sort, which performs best between 1 and 31 lights, Merge sort which performs best between 32 and 199 lights, and finally radix sort which has best performance 200+ lights more.
  • This PR has a bit of regression in performance on HDAdditionalLightData creation, and modification of several properties. This perf regression does justify the gains on the main pipeline. At some point, DOTs lights should fix any performance issues inside HDAdditionalLightData. This PR's mission is to optimize the HDRP side, not the game object side.

NOTE:

  • The convert to GPU format path is still sub optimal, since its tightly coupled with shadow processing / allocation. The next round of optimizations will focus on shadow allocation.
  • This PR adds an extra cost to HDAdditioanLight having

Testing status

Seeing a reduction of around 50%-70% on PrepareLightsForGPU.
image
Comparisons are done after compiling with IL2CPP at max optimization level, against master on ps4.
Similar performance comparisons must be done in Windows: create a player in windows with il2cpp with maximum optimization level, and compare master vs this branch.

Testing performed

  • Shuriken lights - Created shuriken particle object, added generation of lights. Ensured that lights work, fixed also a crash on master when lights in shuriken have shadows enabled.
  • Tested live editing of HDAdditionalLightData (intensity and spot light inner/outer angles)
  • Tested live editing of area light shadows and geometry generation
  • Tested scene live edit / and general parameters for shadows (spot light shadows, tinting)
  • Tested deleting adding many lights.

Additional testing required:

  • Testing of light cookies
  • Testing of shadow allocation (live editing + stress testing many lights with shadows)
  • IES profile cutoff percent and IES profile live editing / runtime
  • light dimmer parameter
  • fade distance parameter
  • volumetric fade distance parameter
  • affect diffuse / affect specular for all light types parameter
  • shape width / shape height for lights parameter
  • shape radius, softness scale, light range attennuation
  • area light cookie
  • Ray traced shadows for lights (if available)
  • Ray tracing using local lights for reflections
  • Screen space light shadows
  • Interacting light with sky (interactsWithSky parameter)
  • Shadow tint / penumbra tint for all light types
  • flare falloff, flare tint and flare size parameters
  • Light baking and shadow baking of lights.

Technical Information

The following is a technical writeup, intended to give a high level overview of the work done on this PR. Before we can dive in, I need to explain how exactly HDRP does light culling and why it is expensive.
Originally the light loop in HDRP is divided into 3 big steps.

Culling in HDRP

Step 1
first step is frustum culling coming from the SRP, which converts Light game objects (refered as GOs) to VisibleLight structs. VisibleLight struct contains a copy of the visible light, and is all performed in the c++, providing these structs (burst accessible) for the SRP to access

Step 2
Next, HDRP has a HDAdditionalLightData GO for additional PBR options not present in the regular Light component. This component is self managed, and the UI obfuscates everything, creating a nice GUI that lets HDRP have a more advanced light definition.
The second step basically took the VisibleLight objects, and processed them with the HDAdditioanlLightData and did some extra quick rejection. This produced an intermediate list called ProcessedLightData.
The ProcessedLightData then gets sorted using a recursive quicksort, which is unnacceptably heavy on the CPU.

Step 3
Last step then converts ProcessedLightData into a GPU ready struct light light list.

The 2nd and 3rd step were all done linearly, in the main thread, accessed a lot the HDAdditionalData and were huge perf bottle necks. The sort was as well gigantic.

Optimization Parts

Part A
Added the concept of a render entity (HDRenderLightEntity) and a global manager for these entities. This basically mantains light data (from HDAdditionalLightData GO ) as struct. The GO has an entity (handle) to this render data and is responsible to update it from the Sim side. The GO then just becomes an interface to the UI / serialization / artist. Its not used anymore by the render pipeline (with the exception of shadow procesisng, which will require another separate PR).
This effectively (almost) decouples the rneder pipeline from the GO. This allows the render pipeline then to access the HDLightRenderLightEntityData as a struct, thus unlocking burstification.

Part B
Moved the step 2 mentioned above to burst, with the exception of shadow allocation. ProcessedLightData gets accumulated using an atomic increment across all threads. This is a gigantic speedup on the CPU. With the exception of shadows, shadow lights still get processed 'patched' in the main thread.

Part C
Rewrote the sorts to be done only on UInts, and on pointers for maximum speed.
Added 3 sort algorithms:
Insertion sort for a light count <32
Non recursive merge sort with auxiliary data for >= 32 and <= 256
Radix sort with auxiliary data with 8 bit radix for 256 >= and above.
This is effectively almost 5x faster, and can also scale massively. The sort is still done in the main thread (there were no good gains on it being on burst)

Part D
Move the Step 3 above to burst. Keep the shadow pieces in the main thread.

Conclusion and Next steps

Huge perf gains, new sort loops available are now in Core, for next work we have to optimize the shadow atlas allocation, matrix calculation. Investigate reducing the 3 step culling algorithm to a single step.

This makes it now possible for DOTS as well to inject its own 'sim side' light definition and wrap the render side entity. Shadow work will finalize the decoupling of HDAdditionalLightData GO and enable this final perk.

@github-actions
Copy link

github-actions bot commented Aug 3, 2021

Hi! This comment will help you figure out which jobs to run before merging your PR. The suggestions are dynamic based on what files you have changed.
Link to Yamato: https://yamato.cds.internal.unity3d.com/jobs/902-Graphics
Search for your PR branch using the sidebar on the left, then add the following segment(s) to the end of the URL (you may need multiple tabs depending on how many packages you change)

HDRP
/.yamato%252Fall-hdrp.yml%2523PR_HDRP_2021.2

SRP Core
You could run ABV on your branch before merging your PR, but it will start A LOT of jobs. Please be responsible about it and run it only when you feel the PR is ready:
/.yamato%252F_abv.yml%2523all_project_ci_2021.2
Be aware that any modifications to the Core package impacts everyone in the Graphics repo so please discuss the PR with your lead.

Depending on the scope of your PR, you may need to run more jobs than what has been suggested. Please speak to your lead or a Graphics SDET (#devs-graphics-automation) if you are unsure.

@kecho kecho changed the title [WIP] Burstification of PrepareLightsForGPU i HDRP. [WIP] Optimizing PrepareLightsForGPU with burst Aug 3, 2021
@kecho kecho force-pushed the HDRP/FastLights branch from ec60bbe to 6731184 Compare August 4, 2021 15:13
@kecho kecho force-pushed the HDRP/FastLights branch 3 times, most recently from 6731df0 to 376fe9d Compare August 4, 2021 20:41
@LaserYGD
Copy link

Hi,

Is this still planned? PrepereLightsForGPU takes +1ms in our project.

@kecho
Copy link
Contributor Author

kecho commented Aug 13, 2021

Hi,

Is this still planned? PrepereLightsForGPU takes +1ms in our project.

Still planned. Had to work on other things this week. Which platform are you seeing this and how many lights? (How many have shadows etc?)

@LaserYGD
Copy link

LaserYGD commented Aug 14, 2021

Hi,
Is this still planned? PrepereLightsForGPU takes +1ms in our project.

Still planned. Had to work on other things this week. Which platform are you seeing this and how many lights? (How many have shadows etc?)

Great to hear that.

Around 170 pointlights, all have shadows disabled & volumetrics disabled. For curiosity In one scene I enabled shadows and got +300ms in PrepereLightsForGPU. I do realize how expensive point light shadows are, just curious if this is expected results from PrepereLightsForGPU.

I thought it might be a misconfiguration or something project related, so I downloaded an asset demo and tried it out there with a large number of point lights, got the same result. Same issue occurs when shadows are set to on enable.

This is the profiler when shadows are enabled:
image

Maybe GC related? PrepereLights gets bad even with 20-40 point lights with shadows enabled.
Project and demo test is on 2020.3.13, Windows.

@kecho
Copy link
Contributor Author

kecho commented Aug 14, 2021

Hi,
Is this still planned? PrepereLightsForGPU takes +1ms in our project.

Still planned. Had to work on other things this week. Which platform are you seeing this and how many lights? (How many have shadows etc?)

Great to hear that.

Around 170 pointlights, all have shadows disabled & volumetrics disabled. For curiosity In one scene I enabled shadows and got +300ms in PrepereLightsForGPU. I do realize how expensive point light shadows are, just curious if this is expected results from PrepereLightsForGPU.

I thought it might be a misconfiguration or something project related, so I downloaded an asset demo and tried it out there with a large number of point lights, got the same result. Same issue occurs when shadows are set to on enable.

This is the profiler when shadows are enabled:
image

Maybe GC related? PrepereLights gets bad even with 20-40 point lights with shadows enabled.
Project and demo test is on 2020.3.13, Windows.

Shadows is doing some crazy cpu work at the moment to allocate them and get info about it's projection. There are many opportunities to optimize it, but that's gonna be a second phase. Your 130 point lights should be reduced highly hopefully.
Another question I have, is what CPU are you doing this on?

@LaserYGD
Copy link

Hi,
Is this still planned? PrepereLightsForGPU takes +1ms in our project.

Still planned. Had to work on other things this week. Which platform are you seeing this and how many lights? (How many have shadows etc?)

Great to hear that.
Around 170 pointlights, all have shadows disabled & volumetrics disabled. For curiosity In one scene I enabled shadows and got +300ms in PrepereLightsForGPU. I do realize how expensive point light shadows are, just curious if this is expected results from PrepereLightsForGPU.
I thought it might be a misconfiguration or something project related, so I downloaded an asset demo and tried it out there with a large number of point lights, got the same result. Same issue occurs when shadows are set to on enable.
This is the profiler when shadows are enabled:
image
Maybe GC related? PrepereLights gets bad even with 20-40 point lights with shadows enabled.
Project and demo test is on 2020.3.13, Windows.

Shadows is doing some crazy cpu work at the moment to allocate them and get info about it's projection. There are many opportunities to optimize it, but that's gonna be a second phase. Your 130 point lights should be reduced highly hopefully.
Another question I have, is what CPU are you doing this on?

laptop: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2601 Mhz, 4 Core(s), 8 Logical Processor(s)

@kecho
Copy link
Contributor Author

kecho commented Aug 16, 2021

laptop: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2601 Mhz, 4 Core(s), 8 Logical Processor(s)

Thank you for your info @LaserYGD
Give me a couple of weeks, I should have something more functional. Id love it if you could test my changes once they are more functional locally in your project. My email is Kleber.Garcia@unity3d.com so feel free to send me any extra info that way. Thank you again, this is great info, confirms my exploration so far.

@kecho kecho force-pushed the HDRP/FastLights branch 4 times, most recently from 1074dd9 to aa22067 Compare August 19, 2021 13:57
@kecho
Copy link
Contributor Author

kecho commented Aug 19, 2021

hey @LaserYGD , I have finished the first round of burstification. Moved the entire 'Preprocess lights' logic to burst.
Locally, I am seeing an improvement of around 70% on CPU wall time. This degrades though when we do shadows. I still have to work on shadow on CPU.
Could you try this branch in your project if you could? Would be great to see the impact of it on your project.

@kecho kecho changed the title [WIP] Optimizing PrepareLightsForGPU with burst Optimizing PrepareLightsForGPU (ProcessVisibleLights) with burst Aug 19, 2021
@LaserYGD
Copy link

LaserYGD commented Aug 20, 2021

@kecho
Hey, so i did 2 tests, on a big scene with lots of lights, and another on the same scene with reduced point lights.
scene is identical between burst and before tests.
All tests were done in the editor. All shadows off.
Note that I didn't average the numbers to get an approximate ~, I simply eyed it over 20-50 frames and picked an estimate. They aren't exact, but they're good enough to get an understanding 😅

burst = this pull

bigger test (200 point lights~):

With burst:

PrepereLightsForGPU(total): 0.70ms ~ 0.90ms
PrepereGPUlightdata: 0.40 - 0.45ms
Visible lights: 0.35 - 0.45ms

without burst, latest beta:

PrepereLightsForGPU(total): 1.00ms ~ 1.10ms
PrepereGPUlightdata: 0.34 - 0.45ms
Visible lights: 0.60ms ~ 0.70ms


Smaller test (67 point lights, not all in view):

Burst:

PrepereLightsForGPU(total): 0.42ms~
PrepareGPULightdata: 0.17ms ~ 0.25ms
VisibleLights: 0.13ms ~ 0.16ms

Without burst, latest beta:

PrepereLightsForGPU(total): 0.38ms~
PrepareGPULightdata: 0.12ms ~ 0.14ms
VisibleLights: 0.19ms ~ 0.23ms

It seems that PrepareGPULightdata is a bit slower in this pull.
VisibleLights improvements is obvious especially when you get to high numbers of lights.

I'm a bit surprised that I'm not getting as big of an improvement as your testing, did you do your tests in the editor?
Smaller test seems to do better without burst. There might be something wrong on my side so in that case I'll send the scene over to you. to do your own tests if you want.
I will do this test again in a public asset scene so I can share it. Will edit this post when I'm done.

edit: sent an email to you with the scene download

@kecho
Copy link
Contributor Author

kecho commented Aug 20, 2021

@kecho
Hey, so i did 2 tests, on a big scene with lots of lights, and another on the same scene with reduced point lights.
scene is identical between burst and before tests.
All tests were done in the editor. All shadows off.
Note that I didn't average the numbers to get an approximate ~, I simply eyed it over 20-50 frames and picked an estimate. They aren't exact, but they're good enough to get an understanding 😅

burst = this pull

bigger test (200 point lights~):

With burst:

PrepereLightsForGPU(total): 0.70ms ~ 0.90ms
PrepereGPUlightdata: 0.40 - 0.45ms
Visible lights: 0.35 - 0.45ms

without burst, latest beta:

PrepereLightsForGPU(total): 1.00ms ~ 1.10ms
PrepereGPUlightdata: 0.34 - 0.45ms
Visible lights: 0.60ms ~ 0.70ms

Smaller test (67 point lights, not all in view):

Burst:

PrepereLightsForGPU(total): 0.42ms~
PrepareGPULightdata: 0.17ms ~ 0.25ms
VisibleLights: 0.13ms ~ 0.16ms

Without burst, latest beta:

PrepereLightsForGPU(total): 0.38ms~
PrepareGPULightdata: 0.12ms ~ 0.14ms
VisibleLights: 0.19ms ~ 0.23ms

It seems that PrepareGPULightdata is a bit slower in this pull.
VisibleLights improvements is obvious especially when you get to high numbers of lights.

I'm a bit surprised that I'm not getting as big of an improvement as your testing, did you do your tests in the editor? Smaller test seems to do better without burst. There might be something wrong on my side so in that case I'll send the scene over to you. to do your own tests if you want. I will do this test again in a public asset scene so I can share it. Will edit this post when I'm done.

edit: sent an email to you with the scene download

Seriously thank you so much! I think the overhead cost is making it a bit more expensive for low count of lights. It also depends on the CPU. I am seeing big perf gains on console. There are some paths that are incomplete, especially PrepareGPULightdata, I had to add some tmp code to keep it working, this is likely making it slower if I had to guess. I'll try to bundle more optimizations on this same PR and then we should see an overall gain.
Thank you so much for the scene. Will keep you posted :)

@remi-chapelain remi-chapelain requested a review from a team August 23, 2021 07:47
@kecho kecho force-pushed the HDRP/FastLights branch 3 times, most recently from 6ccbc92 to 37340ea Compare August 25, 2021 21:18
Copy link
Contributor

@FrancescoC-unity FrancescoC-unity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the various discussed points as future steps (non-blocking for this PR)

@tlaedre
Copy link
Contributor

tlaedre commented Sep 16, 2021

Great work! Really wish someone would slap the backport-to-2021.2 label on this one 🤞

@kecho kecho marked this pull request as ready for review September 20, 2021 13:17
@github-actions github-actions bot added the SRP label Sep 20, 2021
@berniegp
Copy link
Contributor

@LaserYGD check out the Profile Analyzer to get all those stats averaged for you over a range of frames. It also does comparisons of captures.

@sebastienlagarde
Copy link
Contributor

@kecho update status from QA for this one?

@kecho
Copy link
Contributor Author

kecho commented Oct 7, 2021

@sebastienlagarde yes. They found 1 issue, but im on it. Should be fixed by EOD today. Issue happens when there is more than 3k lights. The previous implementation could handle it. So i think its something silly with numbers / allocations. Will keep you posted.

ArgumentException: SetData(): Accessing 168000 bytes at offset 0 for Buffer of size 34944 bytes is not possible. UnityEngine.ComputeBuffer.SetData[T] (Unity.Collections.NativeArray1[T] data, System.Int32 nativeBufferStartIndex, System.Int32 computeBufferStartIndex, System.Int32 count) (at <15558b50626c49f399e321046dbcaba1>:0)
UnityEngine.Rendering.HighDefinition.HDRenderPipeline.PushLightDataGlobalParams (UnityEngine.Rendering.CommandBuffer cmd) (at U:/Repos/gfx-ter/com.unity.render-pipelines.high-definition/Runtime/Lighting/LightLoop/LightLoop.cs:2095)
UnityEngine.Rendering.HighDefinition.HDRenderPipeline.PrepareLightsForGPU (UnityEngine.Rendering.CommandBuffer cmd, UnityEngine.Rendering.HighDefinition.HDCamera hdCamera, UnityEngine.Rendering.CullingResults cullResults, UnityEngine.Rendering.HighDefinition.HDProbeCullingResults hdProbeCullingResults, UnityEngine.Rendering.HighDefinition.LocalVolumetricFogList localVolumetricFogList, UnityEngine.Rendering.HighDefinition.DebugDisplaySettings debugDisplaySettings, UnityEngine.Rendering.HighDefinition.AOVRequestData aovRequest) (at U:/Repos/gfx-ter/com.unity.render-pipelines.high-definition/Runtime/Lighting/LightLoop/LightLoop.cs:1914)
UnityEngine.Rendering.HighDefinition.HDRenderPipeline.ExecuteRenderRequest (UnityEngine.Rendering.HighDefinition.HDRenderPipeline+RenderRequest renderRequest, UnityEngine.Rendering.ScriptableRenderContext renderContext, UnityEngine.Rendering.CommandBuffer cmd, UnityEngine.Rendering.HighDefinition.AOVRequestData aovRequest) (at U:/Repos/gfx-ter/com.unity.render-pipelines.high-definition/Runtime/RenderPipeline/HDRenderPipeline.cs:2024)
UnityEngine.Rendering.HighDefinition.HDRenderPipeline.Render (UnityEngine.Rendering.ScriptableRenderContext renderContext, System.Collections.Generic.List1[T] cameras) (at U:/Repos/gfx-ter/com.unity.render-pipelines.high-definition/Runtime/RenderPipeline/HDRenderPipeline.cs:1833) UnityEngine.Rendering.RenderPipeline.InternalRender (UnityEngine.Rendering.ScriptableRenderContext context, System.Collections.Generic.List1[T] cameras) (at <15558b50626c49f399e321046dbcaba1>:0)
UnityEngine.Rendering.RenderPipelineManager.DoRenderLoop_Internal (UnityEngine.Rendering.RenderPipelineAsset pipe, System.IntPtr loopPtr, System.Collections.Generic.List1[T] renderRequests, Unity.Collections.LowLevel.Unsafe.AtomicSafetyHandle safety) (at <15558b50626c49f399e321046dbcaba1>:0) UnityEngine.GUIUtility:ProcessEvent(Int32, IntPtr, Boolean&)

* Bringing StoA container and small entity system for HDAdditional lights.

* Building entities for each view, and ensuring array size is pooled

* Safety on destruction / empty light list iteration

* Adding markers

* Removing extra entity indirection. Keeping things intrusive

* Jobs for transform data copying

* Parallelizing and creating an append strategy for processed lights

* Light culling jobs, more logic

* Deallocation safety, skipping any processing if m_Size is 0

* compilation issue

* Enabling burst

* Adding more functionality to processing

* Reducing complexity of HDLightEntityCollection

* Refactoring so we dont have to do any intrusive manipulation of light entity. This also decouples some of the issues

* Fixing memory leaks by tracking missing instances during reset

* Removing circular dependency

* Light loop settings and culling maximum light count

* Adding support for evaluation of shadow state

* Adding support for shadow reservation

* Early return if no lights are available

* Sorting and shadow management

* Prepare ligths for GPU marker

* Formatting

* Integrating the new jobified light preprocess into the PrepareLightsForGPU formatting call.

* More work on unification, now loop is connected, commented ray tracing loop for now.

* Fixing sorting offset and counters internally

* Using the correct index for the sort key

* Removing process light data, cleaning up ray tracing light cluster path

* Is created check

* Support for shuriken lights / lights that lack an HDAdditionalData

* Going back to SoA to hit better perf on low number of lights

* Initial changes for PrepareLightsForGPU

* Formatting

* Input stuff

* Porting work, moving first batch of code. WIP

* Finished porting code for LightGetData

* Connecting the BuildLightGPUData port

* Adding shadow allocation for directional shadows

* Enabling burst

* Sorting MSB of sortKey so that we process non directioanl first, and we know exactly how to compute the output index of directional

* Fixing order and mistake on shift

* Right allocation of directional / non directional lights

* Initial work for directional lights

* More work on directional lights

* More work on port, correct sorting order for sortKeys, having directional lights first

* Isolating processing of directional lights to function

* Final pieces for gpu light directional lights

* Burstifying light volume code

* Formatting

* Moving directional light sky / direct light color computation to burst

* Make lights camera relative

* Light counters

* AtoS conversion

* Bug fix, leaving too early

* Safety on allocation count

* Going back to safe references

* Support for ray tracing

* Connecting new light loop

* Finished connecting of new cpu light loop

* Formatting

* Wrong array length passed on radix sort size

* Adding renaming suggested by Remi's code review, much nicer

* formatting

* Renaming to more appropiate names, HDLightEntityCollection -> HDLightRenderDatabase

* Cleaning up database, renaming light collection to database.

* Formatting

* Replacing with dynamic array

* Reset of light count, causing some internal issues when all lights deleted.

* Renaming HDVisibleLightEntities to HDProcessedVisibleLightsBuilder, and cleaning some of the code

* Renamed HDGpuLightList to HDGpuLightBuilder

* Adding safety checks on debug mode

* Fixing shuriken lights that have shadows enabled: ensure to disable them

* Capitalize public functions

* Adding sort algorithm tests

* Deleting duplicate code added by accident

* Adding comment explaining when to add members in the HDLightRenderData

* Formatting

* Removing job that copies transforms, not required anymore

* Formatting

* Adding protection for invalid entity when editing properties on non enabled HDAddtiionalLightData

* Adding another filter for clustered ray tracing lights, this ignores invalid lights

* Correct shadow flag, mistake in refactor

* Adding only cluster type lights to light entityarray list

* Simplifying HDProcessedVisibleLightsBuilder, making it a member entirely of the HDRenderPipeline

* Fixing typo causing memory leak

* Formatting

* Added missing useVolumetric cloud parameter missing, affecting rendering

* Adding rather odd scaling of main light cookie, not getting this code exactly really messes up cloud shadows

* Missing flushing of area lights shape from script

* Shuriken lights attrociousness. Works now though

* Correct order for clustered lights

* Adding auto destruction to fix a memory leak inside shuriken lights

* Forgot to add the correct cam relative world position

* Updated feedback from review, some renames and comments

* Formatting

* Fixed comment

* Changelogs

* Compressing left and right eye copying of lights into GPU buffer, splitting into 2 separate copies
Copy link
Contributor

@RemyUnity RemyUnity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some performance testing and can confirm that it's bette now.
For the manual tests listed, everything seems to be working like on the master branch, so this is a 🟢 for me.
Thank you for the in depth explanations of the PR in the description.

Test doc here : https://docs.google.com/document/d/1ihb4pcw8634-fclHjGpbrgr0ZsIJfc6XW5W-NmI7H5k/edit?usp=sharing

@sebastienlagarde sebastienlagarde merged commit 8e35f78 into master Oct 14, 2021
@sebastienlagarde sebastienlagarde deleted the HDRP/FastLights branch October 14, 2021 18:11
kecho added a commit that referenced this pull request Oct 15, 2021
* Bringing StoA container and small entity system for HDAdditional lights.

* Building entities for each view, and ensuring array size is pooled

* Safety on destruction / empty light list iteration

* Adding markers

* Removing extra entity indirection. Keeping things intrusive

* Jobs for transform data copying

* Parallelizing and creating an append strategy for processed lights

* Light culling jobs, more logic

* Deallocation safety, skipping any processing if m_Size is 0

* compilation issue

* Enabling burst

* Adding more functionality to processing

* Reducing complexity of HDLightEntityCollection

* Refactoring so we dont have to do any intrusive manipulation of light entity. This also decouples some of the issues

* Fixing memory leaks by tracking missing instances during reset

* Removing circular dependency

* Light loop settings and culling maximum light count

* Adding support for evaluation of shadow state

* Adding support for shadow reservation

* Early return if no lights are available

* Sorting and shadow management

* Prepare ligths for GPU marker

* Formatting

* Integrating the new jobified light preprocess into the PrepareLightsForGPU formatting call.

* More work on unification, now loop is connected, commented ray tracing loop for now.

* Fixing sorting offset and counters internally

* Using the correct index for the sort key

* Removing process light data, cleaning up ray tracing light cluster path

* Is created check

* Support for shuriken lights / lights that lack an HDAdditionalData

* Going back to SoA to hit better perf on low number of lights

* Initial changes for PrepareLightsForGPU

* Formatting

* Input stuff

* Porting work, moving first batch of code. WIP

* Finished porting code for LightGetData

* Connecting the BuildLightGPUData port

* Adding shadow allocation for directional shadows

* Enabling burst

* Sorting MSB of sortKey so that we process non directioanl first, and we know exactly how to compute the output index of directional

* Fixing order and mistake on shift

* Right allocation of directional / non directional lights

* Initial work for directional lights

* More work on directional lights

* More work on port, correct sorting order for sortKeys, having directional lights first

* Isolating processing of directional lights to function

* Final pieces for gpu light directional lights

* Burstifying light volume code

* Formatting

* Moving directional light sky / direct light color computation to burst

* Make lights camera relative

* Light counters

* AtoS conversion

* Bug fix, leaving too early

* Safety on allocation count

* Going back to safe references

* Support for ray tracing

* Connecting new light loop

* Finished connecting of new cpu light loop

* Formatting

* Wrong array length passed on radix sort size

* Adding renaming suggested by Remi's code review, much nicer

* formatting

* Renaming to more appropiate names, HDLightEntityCollection -> HDLightRenderDatabase

* Cleaning up database, renaming light collection to database.

* Formatting

* Replacing with dynamic array

* Reset of light count, causing some internal issues when all lights deleted.

* Renaming HDVisibleLightEntities to HDProcessedVisibleLightsBuilder, and cleaning some of the code

* Renamed HDGpuLightList to HDGpuLightBuilder

* Adding safety checks on debug mode

* Fixing shuriken lights that have shadows enabled: ensure to disable them

* Capitalize public functions

* Adding sort algorithm tests

* Deleting duplicate code added by accident

* Adding comment explaining when to add members in the HDLightRenderData

* Formatting

* Removing job that copies transforms, not required anymore

* Formatting

* Adding protection for invalid entity when editing properties on non enabled HDAddtiionalLightData

* Adding another filter for clustered ray tracing lights, this ignores invalid lights

* Correct shadow flag, mistake in refactor

* Adding only cluster type lights to light entityarray list

* Simplifying HDProcessedVisibleLightsBuilder, making it a member entirely of the HDRenderPipeline

* Fixing typo causing memory leak

* Formatting

* Added missing useVolumetric cloud parameter missing, affecting rendering

* Adding rather odd scaling of main light cookie, not getting this code exactly really messes up cloud shadows

* Missing flushing of area lights shape from script

* Shuriken lights attrociousness. Works now though

* Correct order for clustered lights

* Adding auto destruction to fix a memory leak inside shuriken lights

* Forgot to add the correct cam relative world position

* Updated feedback from review, some renames and comments

* Formatting

* Fixed comment

* Changelogs

* Compressing left and right eye copying of lights into GPU buffer, splitting into 2 separate copies
sebastienlagarde added a commit that referenced this pull request Oct 18, 2021
* **CPU Optimizations for PrepareLightsForGPU** (#5294)

* Bringing StoA container and small entity system for HDAdditional lights.

* Building entities for each view, and ensuring array size is pooled

* Safety on destruction / empty light list iteration

* Adding markers

* Removing extra entity indirection. Keeping things intrusive

* Jobs for transform data copying

* Parallelizing and creating an append strategy for processed lights

* Light culling jobs, more logic

* Deallocation safety, skipping any processing if m_Size is 0

* compilation issue

* Enabling burst

* Adding more functionality to processing

* Reducing complexity of HDLightEntityCollection

* Refactoring so we dont have to do any intrusive manipulation of light entity. This also decouples some of the issues

* Fixing memory leaks by tracking missing instances during reset

* Removing circular dependency

* Light loop settings and culling maximum light count

* Adding support for evaluation of shadow state

* Adding support for shadow reservation

* Early return if no lights are available

* Sorting and shadow management

* Prepare ligths for GPU marker

* Formatting

* Integrating the new jobified light preprocess into the PrepareLightsForGPU formatting call.

* More work on unification, now loop is connected, commented ray tracing loop for now.

* Fixing sorting offset and counters internally

* Using the correct index for the sort key

* Removing process light data, cleaning up ray tracing light cluster path

* Is created check

* Support for shuriken lights / lights that lack an HDAdditionalData

* Going back to SoA to hit better perf on low number of lights

* Initial changes for PrepareLightsForGPU

* Formatting

* Input stuff

* Porting work, moving first batch of code. WIP

* Finished porting code for LightGetData

* Connecting the BuildLightGPUData port

* Adding shadow allocation for directional shadows

* Enabling burst

* Sorting MSB of sortKey so that we process non directioanl first, and we know exactly how to compute the output index of directional

* Fixing order and mistake on shift

* Right allocation of directional / non directional lights

* Initial work for directional lights

* More work on directional lights

* More work on port, correct sorting order for sortKeys, having directional lights first

* Isolating processing of directional lights to function

* Final pieces for gpu light directional lights

* Burstifying light volume code

* Formatting

* Moving directional light sky / direct light color computation to burst

* Make lights camera relative

* Light counters

* AtoS conversion

* Bug fix, leaving too early

* Safety on allocation count

* Going back to safe references

* Support for ray tracing

* Connecting new light loop

* Finished connecting of new cpu light loop

* Formatting

* Wrong array length passed on radix sort size

* Adding renaming suggested by Remi's code review, much nicer

* formatting

* Renaming to more appropiate names, HDLightEntityCollection -> HDLightRenderDatabase

* Cleaning up database, renaming light collection to database.

* Formatting

* Replacing with dynamic array

* Reset of light count, causing some internal issues when all lights deleted.

* Renaming HDVisibleLightEntities to HDProcessedVisibleLightsBuilder, and cleaning some of the code

* Renamed HDGpuLightList to HDGpuLightBuilder

* Adding safety checks on debug mode

* Fixing shuriken lights that have shadows enabled: ensure to disable them

* Capitalize public functions

* Adding sort algorithm tests

* Deleting duplicate code added by accident

* Adding comment explaining when to add members in the HDLightRenderData

* Formatting

* Removing job that copies transforms, not required anymore

* Formatting

* Adding protection for invalid entity when editing properties on non enabled HDAddtiionalLightData

* Adding another filter for clustered ray tracing lights, this ignores invalid lights

* Correct shadow flag, mistake in refactor

* Adding only cluster type lights to light entityarray list

* Simplifying HDProcessedVisibleLightsBuilder, making it a member entirely of the HDRenderPipeline

* Fixing typo causing memory leak

* Formatting

* Added missing useVolumetric cloud parameter missing, affecting rendering

* Adding rather odd scaling of main light cookie, not getting this code exactly really messes up cloud shadows

* Missing flushing of area lights shape from script

* Shuriken lights attrociousness. Works now though

* Correct order for clustered lights

* Adding auto destruction to fix a memory leak inside shuriken lights

* Forgot to add the correct cam relative world position

* Updated feedback from review, some renames and comments

* Formatting

* Fixed comment

* Changelogs

* Compressing left and right eye copying of lights into GPU buffer, splitting into 2 separate copies

* Updated whats new

* changelog small entry

* Update CHANGELOG.md

Co-authored-by: sebastienlagarde <sebastien@unity3d.com>
mohammad22 pushed a commit that referenced this pull request Apr 16, 2022
Dynamic GI is currently disabled

Adapted PreprocessDynamicGILights to work with the new parallel visible lights builder

Reenabled customizations

Adapted PrepareDynamicGIDPULightdata to work with the new parallel lights builders

Reenabled DynamicGI

Readded WillRenderShadowMap for the client-side ShadowCachingSystem

Removed review comment

Bugfixes from R&D

Fixed out of bounds access in the DynamicGI visible lights builder

Fixed ClearLightList spill from probe and mask volumes

LightLoop Optimizations Backport: Fixed up HVSSS which was broken in the backport. There were three bugs: 1) the lightData.heirarchicalVarianceScreenSpaceShadowsIndex was getting zero initialized instead of initialized to -1 in the build job. 2) We were requesting unique HVSSS shadows for the regular light loop AND the dynamic GI light loop, causing 2x the hvsss trace work. 3) The HVSSS code interacts with lightData.positionRWS - in the previous version of the lightloop code, this value was still in WS not RWS, and so HVSSS ran its calculations on that. The new light loop code transforms positionRWS into RWS much earlier, so I have updated the HVSSS calculations to work directly off of RWS (which is cleaner anyway).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants