Skip to content

Improve light AABB generation #1574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 22 commits into from
Closed

Improve light AABB generation #1574

wants to merge 22 commits into from

Conversation

EvgeniiG
Copy link
Contributor

@EvgeniiG EvgeniiG commented Aug 11, 2020

Purpose of this PR

Light AABB generation is the first part of the light list generation algorithm (used for tiled and clustered lighting).

There are two issues with the current implementation:

  1. it's slow (214.5 us for 3 lights on the base PS4);
  2. it's underdocumented, and thus difficult to understand, maintain, and optimize.

I have rewritten the code. Highlights include:

  • a single unified algorithm (1 kernel) that handles all types of projection matrices;
  • thoroughly documented code (cannot claim that it's simple, as the task itself is not simple);
  • high performance: 214.5 us -> 77 us for 3 lights (2.78x speedup), 8 threads/light -> 4 threads/light (2x improvement).

I focused on algorithmic rather than low-level optimization. The generated code is not particularly optimal.


Testing status

Manual Tests: What did you do?

  • Opened test project + Run graphic tests locally
  • Built a player
  • Checked new UI names with UX convention
  • Tested UI multi-edition + Undo/Redo + Prefab overrides + Alignment in Preset
  • C# and shader warnings (suppress shader cache to see them)
  • Checked new resources path for the reloader (in developer mode, you have a button at end of resources that check the paths)
  • Other:

Automated Tests: Used existing tests.

Yamato: (Select your branch):
https://yamato.prd.cds.internal.unity3d.com/jobs/902-Graphics/tree/HDRP%252Flight-aabb/.yamato%252Fall-hdrp.yml%2523All_HDRP_2020.1

Any test projects to go with this to help reviewers?


Comments to reviewers

  • QA
  1. test various light setups (many local lights, reflection probes, density volumes, decals) with different camera projection matrices (orthographic, perspective, oblique (used for planar reflection)).
  • Francesco
  1. review the code;
  2. change the WaveLaneSwizzle intrinsic to take (andMask, orMask, xorMask) as parameters;
  3. test on XBox; if you want to make further optimizations, create a new PR.
  • Sebastien
  1. review the code;
  2. merge the PR.

@EvgeniiG EvgeniiG marked this pull request as draft August 11, 2020 21:03
@github-actions
Copy link

It appears that you made a non-draft PR!
Please convert your PR to draft (button on the right side of the page)
and cancel any jobs that started on Yamato.
See the PR template for more information.
Thank you!

@EvgeniiG EvgeniiG changed the title Optimize light AABB generation Improve light AABB generation Aug 12, 2020
var altDist = Mathf.Sqrt(fAltDy * fAltDy + (true ? 1.0f : 2.0f) * fAltDx * fAltDx);
bound.radius = altDist > (0.5f * range) ? altDist : (0.5f * range); // will always pick fAltDist
bound.scaleXY = squeeze ? new Vector2(0.01f, 0.01f) : new Vector2(1.0f, 1.0f);
var altDist = Mathf.Sqrt(fAltDy * fAltDy + (true ? 1.0f : 2.0f) * fAltDx * fAltDx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did not add this code, but having true? is a bit weird here :D

@FrancescoC-unity
Copy link
Contributor

Cannot change the intrinsic as you use it as on xbox the final mask (as used now) needs to be a constant literal.

const unsigned int uFlag3 = GetClip(vP3);
/* ------------------------------ Implementation ---------------------------- */

#define DUMB_COMPILER // Improve the quality of generated code at the expense of readability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless using an IDE with collapsing of #if, the presence of this makes it a bit harder to follow the code.

I assume it is left as define here to show the more readable version, but couldn't we get rid of that and instead comment the "dumb compiler" part so that we don't need the other version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to eventually remove this #define. It's a trivial transformation, and a compiler should be able to do it for us.

We need to report it to Sony and see if it works as expected on Microsoft's machines.

Copy link
Contributor

@JMargevics JMargevics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Here's a test doc for this PR: https://confluence.unity3d.com/display/HDRP/Improve+light+AABB+generation

Generally, we've run through most of light features and areas on all platforms. There are no regressions except Xbox, which is completely broken now. I am still intending to do so performance tests on ps4 and compare it to 8.x.x./release.

@EvgeniiG
Copy link
Contributor Author

Thanks for testing. Xbox should work now.

Copy link
Contributor

@JMargevics JMargevics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @TomasKiniulis verified it on Xbox and it works well now 👍

@EvgeniiG
Copy link
Contributor Author

Excellent! Thanks, guys.

//uint vindex = groupID * NR_THREADS + threadID;
unsigned int g = groupID;
unsigned int t = threadID;
float3x3 Invert3x3(float3x3 R)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this would be useful even outside of this context on its own hlsl file (or already existing common file?) along few of the other generic matrix ops in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We just need to decide what to move to the core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add those in a "math section" in common.hlsl. Or alternatively create a matrix.hlsl file?

Copy link
Contributor Author

@EvgeniiG EvgeniiG Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid creating more files with confusing names. I propose we move this to Common once there is more than 1 use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer than before, nobody will ever know that such a function exist as it is "hidden" in particular file.
Agree to not add more files, so we could move it to common.hlsl, it is a frequent enough function to be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will see whether it makes sense to share some code after I am done with Z-binning.

@sebastienlagarde sebastienlagarde changed the base branch from 8.x.x/release to HDRP/staging September 16, 2020 14:25
@sebastienlagarde sebastienlagarde changed the base branch from HDRP/staging to 8.x.x/release September 16, 2020 14:26
@sebastienlagarde
Copy link
Contributor

@EvgeniiG this PR is targeting 8.x.x/release, sadly we can't retarget to HDRP/staging, could you recreate the PR which target HDRP staging? thanks

@@ -852,11 +852,11 @@ void CompositeOver(real3 colorFront, real3 alphaFront,
// Space transformations
// ----------------------------------------------------------------------------

static const float3x3 k_identity3x3 = {1, 0, 0,
static const float3x3 k_Identity3x3 = {1, 0, 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I know this is not great but we can't rename it without breaking existing code of users (this is also used by URP).

Here there is not a lot of choice, either we don't rename it, or we do the following.
in common.hlsl:

static const float3x3 k_Identity3x3 = {

(i.e your change).

And in CommonDeprecated.hlsl
we add :
// This is obsolete, don't used, keep for compatiblity.
static const float3x3 k_identity3x3 = k_Identity3x3 ;

this will work and not break existing code.

Copy link
Contributor Author

@EvgeniiG EvgeniiG Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a fix. I actually couldn't find this constant because it was capitalized in the wrong way.
I can fix URP (or already have). I propose we just add a comment in this file. If someone's code breaks, it is trivial to fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

being it a fix or not isn't the problem. The issue is that it will break users custom shader. The proposal is to allow to fix the nomenclature and still not break existing user code (by putting old version as k_identity3x3 )

"If someone's code breaks, it is trivial to fix"
We can't afford this anymore. Any change like this must be added to the upgrade guide if there is no automatic upgrade, and for this little change, I will suggest we don't break code.

And yes URP code should be fix in this case as well. You should consider make a separate PR for this change as it isn't related to improve AABB generation and touch a lot of files.

// that is unrolled at the compile time, and the constants are generated during the
// constant propagation pass of the optimizer. This works fine on PlayStation, but does not work
// on Xbox. In order to avoid writing hideous code specifically for Xbox, we disable the support
// of wave intrinsics on Xbox until the Xbox compiler is fixed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a date (or xbox compiler version, so we know which version we are refering too when we read the comment later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


// Improve the quality of generated code at the expense of readability.
// Remove when the shader compiler is clever enough to perform this optimization for us.
#define DUMB_COMPILER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always define? Even for PS4?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the name isn't good, we should say:
#define MANUAL_CODE_OPTIMIZATION ?

Copy link
Contributor Author

@EvgeniiG EvgeniiG Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only tested on the PS4. Don't know about the XBox or Nvidia, I don't have access to the hardware.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, still we should avoid naming thing "dumb compiler" in public code :)

const float4 vP1 = mul(g_mProjection, float4(q1, 1));
const float4 vP2 = mul(g_mProjection, float4(q2, 1));
const float4 vP3 = mul(g_mProjection, float4(q3, 1));
float4 IntersectEdgeAgainstPlane(ClipVertex v0, ClipVertex v1)
Copy link
Contributor

@sebastienlagarde sebastienlagarde Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several function should be move to GeometricTool.hlsl but ok for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in the future, if we end up reusing them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so ok ,but the problem is the reverse, people will never been aware that such a function exist, so they will duplicate it. But fair enough.

@sebastienlagarde
Copy link
Contributor

@JMargevics Hey, thanks for the detailed report! appreciated. I noticed that for the test of planar reflection, you have just tested if the object was working correctly in the planar. But the goal is to check that multiple reflect correctly at various camera angle.
Have you done that? could you check? thanks.

Note: It is not expected that this PR goes in 8.x, so if you can do the test above once the PR is recreated for HDRP/staging, thanks

@EvgeniiG EvgeniiG mentioned this pull request Sep 17, 2020
6 tasks
@EvgeniiG
Copy link
Contributor Author

See #1906

@EvgeniiG EvgeniiG closed this Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants