Skip to content

GLSL ---> HLSL #439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 47 commits into from
Closed

Conversation

nahiim
Copy link
Contributor

@nahiim nahiim commented Dec 7, 2022

Description

GLSL to HLSL porting

Testing

Run compile.bat to compile all ported HLSL files

TODO list:

nahiim and others added 30 commits November 7, 2022 07:42
… Nabla build system via subdirectory and find out that their BS isn't supposed to work being added with this method (wrong link/include paths, weird errors) - TODO: build it as external project
… HLSL shaders and files. Make it capable of validating #include files with depth = 1. The target has it's own rule for each input file to compile and can find and validate the file's dependencies at generate time, therefore it will be good optimized (because it won't try to recompile each shader in the engine once any input file changes) and great for CI checking all shaders in Nabla soon. TODO: create a wrapper cmake script for DXC compiling because there is something off with MSVC environment and we have errors while compiling with DXC, but we don't have them once we copy-paste the executed command in any CMD

struct transfer_t
{
int propertyDWORDsize_flags;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HLSL has bitfields now, use them

Comment on lines +17 to +18
#define xoroshiro64star_state_t uint2
#define xoroshiro64starstar_state_t uint2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use typedefs

slap everything in nbl::hlsl::random::xoroshiro64 namespace

Comment on lines +2 to +11
// Copyright (C) 2018-2022 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h

#ifndef _NBL_BUILTIN_HLSL_SCENE_ANIMATION_INCLUDED_
#define _NBL_BUILTIN_HLSL_SCENE_ANIMATION_INCLUDED_

#include <nbl/builtin/hlsl/scene/keyframe.hlsl>

namespace nbl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the file and the matching GLSL, it was always WIP

#define xoroshiro64star_state_t uint2
#define xoroshiro64starstar_state_t uint2

void xoroshiro64_state_advance(inout uint2 state)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all inout and out variable decls need to be replaced with NBL_HLSL_REF(T)

and we can use it in shared C++ as well

#ifdef __cplusplus
#define NBL_HLSL_REF(T) std::add_reference_t<T>
#else
#define NBL_HLSL_REF(T) inout T
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpentaris works for you and the atomics?

Comment on lines +2 to +30
// Copyright (C) 2018-2022 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h

#ifndef _NBL_BUILTIN_HLSL_SCENE_KEYFRAME_INCLUDED_
#define _NBL_BUILTIN_HLSL_SCENE_KEYFRAME_INCLUDED_

#include <nbl/builtin/hlsl/math/quaternions.hlsl>
#include <nbl/builtin/hlsl/format/decode.hlsl>

namespace nbl
{
namespace hlsl
{
namespace scene
{

using namespace math;

struct Keyframe_t
{
uint2 data[3];

float3 getScale()
{
return decodeRGB18E7S3(data[2]);
}

math::quaternion_t getRotation()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuke the file and its GLSL counterpart, it was always WIP

Comment on lines +2 to +42
// Copyright (C) 2018-2022 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h

#ifndef _NBL_BUILTIN_HLSL_SCENE_NODE_INCLUDED_
#define _NBL_BUILTIN_HLSL_SCENE_NODE_INCLUDED_

#include <nbl/builtin/hlsl/scene/animation.hlsl>

namespace nbl
{
namespace hlsl
{
namespace scene
{
namespace node
{

void initializeLinearSkin(
out float4 accVertexPos, out float3 accVertexNormal,
const float3 inVertexPos, const float3 inVertexNormal,
const float4x4 boneTransform, const float3x3 boneOrientationInvT, const float boneWeight)
{
accVertexPos = mul(boneTransform, float4(inVertexPos * boneWeight, boneWeight));
accVertexNormal = mul(boneOrientationInvT, inVertexNormal * boneWeight);
}



void accumulateLinearSkin(
inout float4 accVertexPos, inout float3 accVertexNormal,
const float3 inVertexPos, const float3 inVertexNormal,
const float4x4 boneTransform, const float3x3 boneOrientationInvT, const float boneWeight)
{
accVertexPos += mul(boneTransform, float4(inVertexPos * boneWeight, boneWeight));
accVertexNormal += mul(boneOrientationInvT, inVertexNormal * boneWeight);
}

}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuke the file and its GLSL counterpart, it was always WIP

// gives false negatives
static bool fastestDoesNotIntersectAABB(in Frustum_t frust, in AABB_t aabb)
{
#define getClosestDP(R) (dot(aabb.getFarthestPointInFront(R.xyz), R.xyz) + R.w)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having it be a macro, make it a (private) method

Comment on lines +51 to +56
const float4x4 pTpose = transpose(proj);

Frustum_t frust;
frust.minPlanes = (float3x4)(pTpose) - float3x4(pTpose[3]*bounds.minVx[0], pTpose[3]*bounds.minVx[1], pTpose[3]*bounds.minVx[2]);
frust.maxPlanes = float3x4(pTpose[3]*bounds.maxVx[0], pTpose[3]*bounds.maxVx[1], pTpose[3]*bounds.maxVx[2]) - (float3x4)(pTpose);
return frust;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the matrix math on this, operator[] in GLSL returned columns, in HLSL it returns rows

float3x3 tp_vertices = transpose(vertices);

// the `normalize` cannot be optimized out
return float3x3(normalize(tp_vertices[0]-origin), normalize(tp_vertices[1]-origin), normalize(tp_vertices[2]-origin));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check matrix math and who uses this, if this matrix was used as a plain float3[3] no problem, worse if people were multiplying things with it

//
float3x3 getSphericalTriangle(in float3x3 vertices, in float3 origin)
{
float3x3 tp_vertices = transpose(vertices);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +2 to +32
// Copyright (C) 2018-2022 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h
#ifndef _NBL_BUILTIN_HLSL_UTILS_ACCELERATION_STRUCTURES_INCLUDED_
#define _NBL_BUILTIN_HLSL_UTILS_ACCELERATION_STRUCTURES_INCLUDED_


namespace nbl
{
namespace hlsl
{


// Use for Indirect Builds
struct BuildRangeInfo
{
uint primitiveCount;
uint primitiveOffset;
uint firstVertex;
uint transformOffset;

#ifdef __cplusplus
auto operator<=>(const BuildRangeInfo&) const = default;
#endif // __cplusplus
};


}
}

#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuke it, I've got a new and beter one in #521

}
float3 GetEyePos(in float4x3 _NormalMatAndEyePos)
{
return _NormalMatAndEyePos[3];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer true, you need to extract last COLUMN, not ROW

either transpose and get last row of tposed, OR multiply with float4(0,0,0,1)

Comment on lines +13 to +20
struct CompressedNormalMatrix_t
{
uint4 data;


float3x3 decode(in CompressedNormalMatrix_t compr)
{
float3x3 m;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the original even used anywhere in GLSL ?

Comment on lines +1 to +41

// Copyright (C) 2018-2022 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h

#ifndef _NBL_BUILTIN_HLSL_UTILS_CULLING_INCLUDED_
#define _NBL_BUILTIN_HLSL_UTILS_CULLING_INCLUDED_

#include <nbl/builtin/hlsl/shapes/aabb.hlsl>
#include <nbl/builtin/hlsl/shapes/frustum.hlsl>


namespace nbl
{
namespace hlsl
{


// gives false negatives
bool fastestFrustumCullAABB(in float4x4 proj, in shapes::AABB_t aabb)
{
const shapes::Frustum_t frust = shapes::Frustum_t::extract(proj);
return shapes::Frustum_t::fastestDoesNotIntersectAABB(frust, aabb);
}

// gives very few false negatives
bool fastFrustumCullAABB(in float4x4 proj, in float4x4 invProj, in shapes::AABB_t aabb)
{
if (fastestFrustumCullAABB(proj,aabb))
return true;

const shapes::Frustum_t boxInvFrustum = shapes::Frustum_t::extract(invProj);
shapes::AABB_t ndc;
ndc.minVx = float3(-1.f,-1.f,0.f);
ndc.maxVx = float3(1.f,1.f,1.f);
return shapes::Frustum_t::fastestDoesNotIntersectAABB(boxInvFrustum,ndc);
}

// perfect Separating Axis Theorem, needed for Clustered/Tiled Lighting
bool preciseFrustumCullAABB(in float4x4 proj, in float4x4 invProj, in shapes::AABB_t aabb)
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuke the file we'll bring it back later

Comment on lines +13 to +48

struct DrawArraysIndirectCommand_t
{
uint count;
uint instanceCount;
uint first;
uint baseInstance;
};

struct DrawElementsIndirectCommand_t
{
uint count;
uint instanceCount;
uint firstIndex;
uint baseVertex;
uint baseInstance;
};

struct DispatchIndirectCommand_t
{
uint num_groups_x;
uint num_groups_y;
uint num_groups_z;
};


uint computeOptimalPersistentWorkgroupDispatchSize(in uint elementCount, in uint workgroupSize, in uint workgroupSpinningProtection)
{
const uint infinitelyWideDeviceWGCount = (elementCount-1u)/(workgroupSize*workgroupSpinningProtection)+1u;
return min(infinitelyWideDeviceWGCount,NBL_HLSL_LIMIT_MAX_RESIDENT_INVOCATIONS/NBL_HLSL_LIMIT_MAX_OPTIMALLY_RESIDENT_WORKGROUP_INVOCATIONS);
}
uint computeOptimalPersistentWorkgroupDispatchSize(in uint elementCount, in uint workgroupSize)
{
return computeOptimalPersistentWorkgroupDispatchSize(elementCount,workgroupSize,1u);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put everything inside indirect_commands namespace and take away IndirectCommand from the typename

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also remove Workgroup from the compute function names, they're too long

Comment on lines +18 to +43
uint decode2d4bComponent(in uint x)
{
x &= 0x55555555u;
x = (x ^ (x >> 1u)) & 0x33333333u;
x = (x ^ (x >> 2u)) & 0x0f0f0f0fu;
return x;
}

uint decode2d8bComponent(in uint x)
{
x &= 0x55555555u;
x = (x ^ (x >> 1u)) & 0x33333333u;
x = (x ^ (x >> 2u)) & 0x0f0f0f0fu;
x = (x ^ (x >> 4u)) & 0x00ff00ffu;
return x;
}

uint2 decode2d4b(in uint x)
{
return uint2(decode2d4bComponent(x), decode2d4bComponent(x >> 1u));
}

uint2 decode2d8b(in uint x)
{
return uint2(decode2d8bComponent(x), decode2d8bComponent(x >> 1u));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Przemog1 give this a look whether its correct and compare against the GLSL

{


float3 signedSpherical(in float2 enc)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct functor please, also give it doube check against GLSL

{


float2 signedSpherical(in float3 n)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct functor please, also give it doube check against GLSL

Comment on lines +109 to +132
float2 transformedDerivatives(in uint swapchainTransform, in float2 ddxDdy)
{
#define OUTPUT_TYPE float2
#include "nbl/builtin/hlsl/utils/surface_transform_transformedDerivatives.hlsl"
#undef OUTPUT_TYPE
}
float2x2 transformedDerivatives(in uint swapchainTransform, in float2x2 ddxDdy)
{
#define OUTPUT_TYPE float2x2
#include "nbl/builtin/hlsl/utils/surface_transform_transformedDerivatives.hlsl"
#undef OUTPUT_TYPE
}
float2x3 transformedDerivatives(in uint swapchainTransform, in float2x3 ddxDdy)
{
#define OUTPUT_TYPE float2x3
#include "nbl/builtin/hlsl/utils/surface_transform_transformedDerivatives.hlsl"
#undef OUTPUT_TYPE
}
float2x4 transformedDerivatives(in uint swapchainTransform, in float2x4 ddxDdy)
{
#define OUTPUT_TYPE float2x4
#include "nbl/builtin/hlsl/utils/surface_transform_transformedDerivatives.hlsl"
#undef OUTPUT_TYPE
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll no longer need "nbl/builtin/hlsl/utils/surface_transform_transformedDerivatives.hlsl" to exist, this is doable with a template

Comment on lines +146 to +164
return mul(ndc, float2x2(cos90, -sin90, sin90, cos90));

case ROTATE_180:
return mul(ndc, float2x2(cos180, -sin180, sin180, cos180));

case ROTATE_270:
return mul(ndc, float2x2(cos270, -sin270, sin270, cos270));

case HORIZONTAL_MIRROR:
return mul(ndc, float2x2(-1, 0, 0, 1));

case HORIZONTAL_MIRROR_ROTATE_90:
return mul(ndc, float2x2(-cos90, sin90, sin90, cos90));

case HORIZONTAL_MIRROR_ROTATE_180:
return mul(ndc, float2x2(-cos180, sin180, sin180, cos180));

case HORIZONTAL_MIRROR_ROTATE_270:
return mul(ndc, float2x2(-cos270, sin270, sin270, cos270));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deprilula28 check the math on this, I know HLSL is funky with multiplications

Comment on lines +17 to +24
static const uint IDENTITY = 0x00000001;
static const uint ROTATE_90 = 0x00000002;
static const uint ROTATE_180 = 0x00000004;
static const uint ROTATE_270 = 0x00000008;
static const uint HORIZONTAL_MIRROR = 0x00000010;
static const uint HORIZONTAL_MIRROR_ROTATE_90 = 0x00000020;
static const uint HORIZONTAL_MIRROR_ROTATE_180 = 0x00000040;
static const uint HORIZONTAL_MIRROR_ROTATE_270 = 0x00000080;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declare these as enum : uint32_t (obvs have a typedef for uint32_t to uint first in the compat header)

Comment on lines +20 to +45
// move to ieee754 header?
float3x3 mul_with_bounds_wo_gamma(out float3x3 error, in float3x3 a, in float3x3 b, in float b_relative_error)
{
float3x3 retval;
for (int i=0; i<3; i++)
{
float3 tmp = a[0]*b[i][0];
retval[i] = tmp;
error[i] = abs(tmp);
for (int j=1; j<3; j++)
{
tmp = a[j]*b[i][j];
retval[i] += tmp;
error[i] += abs(tmp);
}
}
const float error_factor = 1.f+b_relative_error/numeric_limits::float_epsilon(2);
error *= error_factor;
return retval;
}
float3x3 mul_with_bounds(out float3x3 error, in float3x3 a, in float3x3 b, in float b_relative_error)
{
float3x3 retval = mul_with_bounds_wo_gamma(error,a,b,b_relative_error);
error *= ieee754::gamma(2u);
return retval;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please the error_arithmetic class

Comment on lines +48 to +73
/*
float4 pseudoMul4x4with3x1(in float4x4 m, in float3 v)
{
return m[0] * v.x + m[1] * v.y + m[2] * v.z + m[3];
}
float3 pseudoMul3x4with3x1(in float4x3 m, in float3 v)
{
return m[0] * v.x + m[1] * v.y + m[2] * v.z + m[3];
}
float4x3 pseudoMul4x3with4x3(in float4x3 lhs, in float4x3 rhs) // TODO: change name to 3x4with3x4
{
float4x3 result;
for (int i = 0; i < 4; i++)
result[i] = lhs[0] * rhs[i][0] + lhs[1] * rhs[i][1] + lhs[2] * rhs[i][2];
result[3] += lhs[3];
return result;
}
float4 pseudoMul4x4with4x3(in float4 proj, in float4x3 tform)
{
float4 result;
for (int i = 0; i < 4; i++)
result[i] = proj[0] * tform[i][0] + proj[1] * tform[i][1] + proj[2] * tform[i][2];
result[3] += proj[3];
return result;
}
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this also commented out in the GLSL ?

{
float3 tmp = mul(sub3x3TransposeCofactors, normal);
const float tmpLenRcp = rsqrt(dot(tmp,tmp));
return tmp * asfloat(asuint(tmpLenRcp)^signFlipMask);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

erm and where is the signFlipMask defined?

// use this if you anticipate flipped/mirrored models
float3 fastNormalTransform(in uint signFlipMask, in float3x3 sub3x3TransposeCofactors, in float3 normal)
{
float3 tmp = mul(sub3x3TransposeCofactors, normal);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check this matix vs point multiplication

unfortunately you need to find the place where sub3x3TransposeCofactors is being fed from to understand if its correct

Comment on lines +100 to +110
//
float4x3 pseudoInverse3x4(in float4x3 tform)
{
const float3x3 sub3x3Inv = transpose(float3x3(tform[0], tform[1], tform[2]));
float4x3 retval;
retval[0] = sub3x3Inv[0];
retval[1] = sub3x3Inv[1];
retval[2] = sub3x3Inv[2];
retval[3] = mul(-sub3x3Inv, tform[3]);
return retval;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. do we actually ever use it
  2. does the GLSL version also ignore shear and scale? (this way of inverting the matrix only works on rotation + translation only matrices)
  3. assume tform is a row-major world matrix identical to core::matrix4x3SIMD, as I right in assuming that the multiplication is wrong here due to HLSL quirkyness?
  4. btw the - should go on the vector, not the matrix, less components to negate

@devshgraphicsprogramming
Copy link
Member

btw all the .bin test compilation files should be nuked and then regenerated in the Nabla CI reference results submodule

@devshgraphicsprogramming
Copy link
Member

Merged as part of #475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants