-
Notifications
You must be signed in to change notification settings - Fork 64
GLSL ---> HLSL #439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GLSL ---> HLSL #439
Conversation
… Nabla build system via subdirectory and find out that their BS isn't supposed to work being added with this method (wrong link/include paths, weird errors) - TODO: build it as external project
…to dxcIntegration
… HLSL shaders and files. Make it capable of validating #include files with depth = 1. The target has it's own rule for each input file to compile and can find and validate the file's dependencies at generate time, therefore it will be good optimized (because it won't try to recompile each shader in the engine once any input file changes) and great for CI checking all shaders in Nabla soon. TODO: create a wrapper cmake script for DXC compiling because there is something off with MSVC environment and we have errors while compiling with DXC, but we don't have them once we copy-paste the executed command in any CMD
|
||
struct transfer_t | ||
{ | ||
int propertyDWORDsize_flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HLSL has bitfields now, use them
#define xoroshiro64star_state_t uint2 | ||
#define xoroshiro64starstar_state_t uint2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use typedefs
slap everything in nbl::hlsl::random::xoroshiro64
namespace
// Copyright (C) 2018-2022 - DevSH Graphics Programming Sp. z O.O. | ||
// This file is part of the "Nabla Engine". | ||
// For conditions of distribution and use, see copyright notice in nabla.h | ||
|
||
#ifndef _NBL_BUILTIN_HLSL_SCENE_ANIMATION_INCLUDED_ | ||
#define _NBL_BUILTIN_HLSL_SCENE_ANIMATION_INCLUDED_ | ||
|
||
#include <nbl/builtin/hlsl/scene/keyframe.hlsl> | ||
|
||
namespace nbl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete the file and the matching GLSL, it was always WIP
#define xoroshiro64star_state_t uint2 | ||
#define xoroshiro64starstar_state_t uint2 | ||
|
||
void xoroshiro64_state_advance(inout uint2 state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all inout
and out
variable decls need to be replaced with NBL_HLSL_REF(T)
and we can use it in shared C++ as well
#ifdef __cplusplus
#define NBL_HLSL_REF(T) std::add_reference_t<T>
#else
#define NBL_HLSL_REF(T) inout T
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kpentaris works for you and the atomics?
// Copyright (C) 2018-2022 - DevSH Graphics Programming Sp. z O.O. | ||
// This file is part of the "Nabla Engine". | ||
// For conditions of distribution and use, see copyright notice in nabla.h | ||
|
||
#ifndef _NBL_BUILTIN_HLSL_SCENE_KEYFRAME_INCLUDED_ | ||
#define _NBL_BUILTIN_HLSL_SCENE_KEYFRAME_INCLUDED_ | ||
|
||
#include <nbl/builtin/hlsl/math/quaternions.hlsl> | ||
#include <nbl/builtin/hlsl/format/decode.hlsl> | ||
|
||
namespace nbl | ||
{ | ||
namespace hlsl | ||
{ | ||
namespace scene | ||
{ | ||
|
||
using namespace math; | ||
|
||
struct Keyframe_t | ||
{ | ||
uint2 data[3]; | ||
|
||
float3 getScale() | ||
{ | ||
return decodeRGB18E7S3(data[2]); | ||
} | ||
|
||
math::quaternion_t getRotation() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nuke the file and its GLSL counterpart, it was always WIP
// Copyright (C) 2018-2022 - DevSH Graphics Programming Sp. z O.O. | ||
// This file is part of the "Nabla Engine". | ||
// For conditions of distribution and use, see copyright notice in nabla.h | ||
|
||
#ifndef _NBL_BUILTIN_HLSL_SCENE_NODE_INCLUDED_ | ||
#define _NBL_BUILTIN_HLSL_SCENE_NODE_INCLUDED_ | ||
|
||
#include <nbl/builtin/hlsl/scene/animation.hlsl> | ||
|
||
namespace nbl | ||
{ | ||
namespace hlsl | ||
{ | ||
namespace scene | ||
{ | ||
namespace node | ||
{ | ||
|
||
void initializeLinearSkin( | ||
out float4 accVertexPos, out float3 accVertexNormal, | ||
const float3 inVertexPos, const float3 inVertexNormal, | ||
const float4x4 boneTransform, const float3x3 boneOrientationInvT, const float boneWeight) | ||
{ | ||
accVertexPos = mul(boneTransform, float4(inVertexPos * boneWeight, boneWeight)); | ||
accVertexNormal = mul(boneOrientationInvT, inVertexNormal * boneWeight); | ||
} | ||
|
||
|
||
|
||
void accumulateLinearSkin( | ||
inout float4 accVertexPos, inout float3 accVertexNormal, | ||
const float3 inVertexPos, const float3 inVertexNormal, | ||
const float4x4 boneTransform, const float3x3 boneOrientationInvT, const float boneWeight) | ||
{ | ||
accVertexPos += mul(boneTransform, float4(inVertexPos * boneWeight, boneWeight)); | ||
accVertexNormal += mul(boneOrientationInvT, inVertexNormal * boneWeight); | ||
} | ||
|
||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nuke the file and its GLSL counterpart, it was always WIP
// gives false negatives | ||
static bool fastestDoesNotIntersectAABB(in Frustum_t frust, in AABB_t aabb) | ||
{ | ||
#define getClosestDP(R) (dot(aabb.getFarthestPointInFront(R.xyz), R.xyz) + R.w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having it be a macro, make it a (private) method
const float4x4 pTpose = transpose(proj); | ||
|
||
Frustum_t frust; | ||
frust.minPlanes = (float3x4)(pTpose) - float3x4(pTpose[3]*bounds.minVx[0], pTpose[3]*bounds.minVx[1], pTpose[3]*bounds.minVx[2]); | ||
frust.maxPlanes = float3x4(pTpose[3]*bounds.maxVx[0], pTpose[3]*bounds.maxVx[1], pTpose[3]*bounds.maxVx[2]) - (float3x4)(pTpose); | ||
return frust; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the matrix math on this, operator[] in GLSL returned columns, in HLSL it returns rows
float3x3 tp_vertices = transpose(vertices); | ||
|
||
// the `normalize` cannot be optimized out | ||
return float3x3(normalize(tp_vertices[0]-origin), normalize(tp_vertices[1]-origin), normalize(tp_vertices[2]-origin)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check matrix math and who uses this, if this matrix was used as a plain float3[3]
no problem, worse if people were multiplying things with it
// | ||
float3x3 getSphericalTriangle(in float3x3 vertices, in float3 origin) | ||
{ | ||
float3x3 tp_vertices = transpose(vertices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look here
the tranposes
cancel out, so it actually breaks things
// Copyright (C) 2018-2022 - DevSH Graphics Programming Sp. z O.O. | ||
// This file is part of the "Nabla Engine". | ||
// For conditions of distribution and use, see copyright notice in nabla.h | ||
#ifndef _NBL_BUILTIN_HLSL_UTILS_ACCELERATION_STRUCTURES_INCLUDED_ | ||
#define _NBL_BUILTIN_HLSL_UTILS_ACCELERATION_STRUCTURES_INCLUDED_ | ||
|
||
|
||
namespace nbl | ||
{ | ||
namespace hlsl | ||
{ | ||
|
||
|
||
// Use for Indirect Builds | ||
struct BuildRangeInfo | ||
{ | ||
uint primitiveCount; | ||
uint primitiveOffset; | ||
uint firstVertex; | ||
uint transformOffset; | ||
|
||
#ifdef __cplusplus | ||
auto operator<=>(const BuildRangeInfo&) const = default; | ||
#endif // __cplusplus | ||
}; | ||
|
||
|
||
} | ||
} | ||
|
||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nuke it, I've got a new and beter one in #521
} | ||
float3 GetEyePos(in float4x3 _NormalMatAndEyePos) | ||
{ | ||
return _NormalMatAndEyePos[3]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer true, you need to extract last COLUMN, not ROW
either transpose and get last row of tposed, OR multiply with float4(0,0,0,1)
struct CompressedNormalMatrix_t | ||
{ | ||
uint4 data; | ||
|
||
|
||
float3x3 decode(in CompressedNormalMatrix_t compr) | ||
{ | ||
float3x3 m; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was the original even used anywhere in GLSL ?
|
||
// Copyright (C) 2018-2022 - DevSH Graphics Programming Sp. z O.O. | ||
// This file is part of the "Nabla Engine". | ||
// For conditions of distribution and use, see copyright notice in nabla.h | ||
|
||
#ifndef _NBL_BUILTIN_HLSL_UTILS_CULLING_INCLUDED_ | ||
#define _NBL_BUILTIN_HLSL_UTILS_CULLING_INCLUDED_ | ||
|
||
#include <nbl/builtin/hlsl/shapes/aabb.hlsl> | ||
#include <nbl/builtin/hlsl/shapes/frustum.hlsl> | ||
|
||
|
||
namespace nbl | ||
{ | ||
namespace hlsl | ||
{ | ||
|
||
|
||
// gives false negatives | ||
bool fastestFrustumCullAABB(in float4x4 proj, in shapes::AABB_t aabb) | ||
{ | ||
const shapes::Frustum_t frust = shapes::Frustum_t::extract(proj); | ||
return shapes::Frustum_t::fastestDoesNotIntersectAABB(frust, aabb); | ||
} | ||
|
||
// gives very few false negatives | ||
bool fastFrustumCullAABB(in float4x4 proj, in float4x4 invProj, in shapes::AABB_t aabb) | ||
{ | ||
if (fastestFrustumCullAABB(proj,aabb)) | ||
return true; | ||
|
||
const shapes::Frustum_t boxInvFrustum = shapes::Frustum_t::extract(invProj); | ||
shapes::AABB_t ndc; | ||
ndc.minVx = float3(-1.f,-1.f,0.f); | ||
ndc.maxVx = float3(1.f,1.f,1.f); | ||
return shapes::Frustum_t::fastestDoesNotIntersectAABB(boxInvFrustum,ndc); | ||
} | ||
|
||
// perfect Separating Axis Theorem, needed for Clustered/Tiled Lighting | ||
bool preciseFrustumCullAABB(in float4x4 proj, in float4x4 invProj, in shapes::AABB_t aabb) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nuke the file we'll bring it back later
|
||
struct DrawArraysIndirectCommand_t | ||
{ | ||
uint count; | ||
uint instanceCount; | ||
uint first; | ||
uint baseInstance; | ||
}; | ||
|
||
struct DrawElementsIndirectCommand_t | ||
{ | ||
uint count; | ||
uint instanceCount; | ||
uint firstIndex; | ||
uint baseVertex; | ||
uint baseInstance; | ||
}; | ||
|
||
struct DispatchIndirectCommand_t | ||
{ | ||
uint num_groups_x; | ||
uint num_groups_y; | ||
uint num_groups_z; | ||
}; | ||
|
||
|
||
uint computeOptimalPersistentWorkgroupDispatchSize(in uint elementCount, in uint workgroupSize, in uint workgroupSpinningProtection) | ||
{ | ||
const uint infinitelyWideDeviceWGCount = (elementCount-1u)/(workgroupSize*workgroupSpinningProtection)+1u; | ||
return min(infinitelyWideDeviceWGCount,NBL_HLSL_LIMIT_MAX_RESIDENT_INVOCATIONS/NBL_HLSL_LIMIT_MAX_OPTIMALLY_RESIDENT_WORKGROUP_INVOCATIONS); | ||
} | ||
uint computeOptimalPersistentWorkgroupDispatchSize(in uint elementCount, in uint workgroupSize) | ||
{ | ||
return computeOptimalPersistentWorkgroupDispatchSize(elementCount,workgroupSize,1u); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put everything inside indirect_commands
namespace and take away IndirectCommand
from the typename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also remove Workgroup
from the compute
function names, they're too long
uint decode2d4bComponent(in uint x) | ||
{ | ||
x &= 0x55555555u; | ||
x = (x ^ (x >> 1u)) & 0x33333333u; | ||
x = (x ^ (x >> 2u)) & 0x0f0f0f0fu; | ||
return x; | ||
} | ||
|
||
uint decode2d8bComponent(in uint x) | ||
{ | ||
x &= 0x55555555u; | ||
x = (x ^ (x >> 1u)) & 0x33333333u; | ||
x = (x ^ (x >> 2u)) & 0x0f0f0f0fu; | ||
x = (x ^ (x >> 4u)) & 0x00ff00ffu; | ||
return x; | ||
} | ||
|
||
uint2 decode2d4b(in uint x) | ||
{ | ||
return uint2(decode2d4bComponent(x), decode2d4bComponent(x >> 1u)); | ||
} | ||
|
||
uint2 decode2d8b(in uint x) | ||
{ | ||
return uint2(decode2d8bComponent(x), decode2d8bComponent(x >> 1u)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Przemog1 give this a look whether its correct and compare against the GLSL
{ | ||
|
||
|
||
float3 signedSpherical(in float2 enc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct functor please, also give it doube check against GLSL
{ | ||
|
||
|
||
float2 signedSpherical(in float3 n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct functor please, also give it doube check against GLSL
float2 transformedDerivatives(in uint swapchainTransform, in float2 ddxDdy) | ||
{ | ||
#define OUTPUT_TYPE float2 | ||
#include "nbl/builtin/hlsl/utils/surface_transform_transformedDerivatives.hlsl" | ||
#undef OUTPUT_TYPE | ||
} | ||
float2x2 transformedDerivatives(in uint swapchainTransform, in float2x2 ddxDdy) | ||
{ | ||
#define OUTPUT_TYPE float2x2 | ||
#include "nbl/builtin/hlsl/utils/surface_transform_transformedDerivatives.hlsl" | ||
#undef OUTPUT_TYPE | ||
} | ||
float2x3 transformedDerivatives(in uint swapchainTransform, in float2x3 ddxDdy) | ||
{ | ||
#define OUTPUT_TYPE float2x3 | ||
#include "nbl/builtin/hlsl/utils/surface_transform_transformedDerivatives.hlsl" | ||
#undef OUTPUT_TYPE | ||
} | ||
float2x4 transformedDerivatives(in uint swapchainTransform, in float2x4 ddxDdy) | ||
{ | ||
#define OUTPUT_TYPE float2x4 | ||
#include "nbl/builtin/hlsl/utils/surface_transform_transformedDerivatives.hlsl" | ||
#undef OUTPUT_TYPE | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll no longer need "nbl/builtin/hlsl/utils/surface_transform_transformedDerivatives.hlsl" to exist, this is doable with a template
return mul(ndc, float2x2(cos90, -sin90, sin90, cos90)); | ||
|
||
case ROTATE_180: | ||
return mul(ndc, float2x2(cos180, -sin180, sin180, cos180)); | ||
|
||
case ROTATE_270: | ||
return mul(ndc, float2x2(cos270, -sin270, sin270, cos270)); | ||
|
||
case HORIZONTAL_MIRROR: | ||
return mul(ndc, float2x2(-1, 0, 0, 1)); | ||
|
||
case HORIZONTAL_MIRROR_ROTATE_90: | ||
return mul(ndc, float2x2(-cos90, sin90, sin90, cos90)); | ||
|
||
case HORIZONTAL_MIRROR_ROTATE_180: | ||
return mul(ndc, float2x2(-cos180, sin180, sin180, cos180)); | ||
|
||
case HORIZONTAL_MIRROR_ROTATE_270: | ||
return mul(ndc, float2x2(-cos270, sin270, sin270, cos270)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deprilula28 check the math on this, I know HLSL is funky with multiplications
static const uint IDENTITY = 0x00000001; | ||
static const uint ROTATE_90 = 0x00000002; | ||
static const uint ROTATE_180 = 0x00000004; | ||
static const uint ROTATE_270 = 0x00000008; | ||
static const uint HORIZONTAL_MIRROR = 0x00000010; | ||
static const uint HORIZONTAL_MIRROR_ROTATE_90 = 0x00000020; | ||
static const uint HORIZONTAL_MIRROR_ROTATE_180 = 0x00000040; | ||
static const uint HORIZONTAL_MIRROR_ROTATE_270 = 0x00000080; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declare these as enum : uint32_t
(obvs have a typedef for uint32_t
to uint
first in the compat header)
// move to ieee754 header? | ||
float3x3 mul_with_bounds_wo_gamma(out float3x3 error, in float3x3 a, in float3x3 b, in float b_relative_error) | ||
{ | ||
float3x3 retval; | ||
for (int i=0; i<3; i++) | ||
{ | ||
float3 tmp = a[0]*b[i][0]; | ||
retval[i] = tmp; | ||
error[i] = abs(tmp); | ||
for (int j=1; j<3; j++) | ||
{ | ||
tmp = a[j]*b[i][j]; | ||
retval[i] += tmp; | ||
error[i] += abs(tmp); | ||
} | ||
} | ||
const float error_factor = 1.f+b_relative_error/numeric_limits::float_epsilon(2); | ||
error *= error_factor; | ||
return retval; | ||
} | ||
float3x3 mul_with_bounds(out float3x3 error, in float3x3 a, in float3x3 b, in float b_relative_error) | ||
{ | ||
float3x3 retval = mul_with_bounds_wo_gamma(error,a,b,b_relative_error); | ||
error *= ieee754::gamma(2u); | ||
return retval; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please the error_arithmetic class
/* | ||
float4 pseudoMul4x4with3x1(in float4x4 m, in float3 v) | ||
{ | ||
return m[0] * v.x + m[1] * v.y + m[2] * v.z + m[3]; | ||
} | ||
float3 pseudoMul3x4with3x1(in float4x3 m, in float3 v) | ||
{ | ||
return m[0] * v.x + m[1] * v.y + m[2] * v.z + m[3]; | ||
} | ||
float4x3 pseudoMul4x3with4x3(in float4x3 lhs, in float4x3 rhs) // TODO: change name to 3x4with3x4 | ||
{ | ||
float4x3 result; | ||
for (int i = 0; i < 4; i++) | ||
result[i] = lhs[0] * rhs[i][0] + lhs[1] * rhs[i][1] + lhs[2] * rhs[i][2]; | ||
result[3] += lhs[3]; | ||
return result; | ||
} | ||
float4 pseudoMul4x4with4x3(in float4 proj, in float4x3 tform) | ||
{ | ||
float4 result; | ||
for (int i = 0; i < 4; i++) | ||
result[i] = proj[0] * tform[i][0] + proj[1] * tform[i][1] + proj[2] * tform[i][2]; | ||
result[3] += proj[3]; | ||
return result; | ||
} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this also commented out in the GLSL ?
{ | ||
float3 tmp = mul(sub3x3TransposeCofactors, normal); | ||
const float tmpLenRcp = rsqrt(dot(tmp,tmp)); | ||
return tmp * asfloat(asuint(tmpLenRcp)^signFlipMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
erm and where is the signFlipMask defined?
// use this if you anticipate flipped/mirrored models | ||
float3 fastNormalTransform(in uint signFlipMask, in float3x3 sub3x3TransposeCofactors, in float3 normal) | ||
{ | ||
float3 tmp = mul(sub3x3TransposeCofactors, normal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check this matix vs point multiplication
unfortunately you need to find the place where sub3x3TransposeCofactors
is being fed from to understand if its correct
// | ||
float4x3 pseudoInverse3x4(in float4x3 tform) | ||
{ | ||
const float3x3 sub3x3Inv = transpose(float3x3(tform[0], tform[1], tform[2])); | ||
float4x3 retval; | ||
retval[0] = sub3x3Inv[0]; | ||
retval[1] = sub3x3Inv[1]; | ||
retval[2] = sub3x3Inv[2]; | ||
retval[3] = mul(-sub3x3Inv, tform[3]); | ||
return retval; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- do we actually ever use it
- does the GLSL version also ignore shear and scale? (this way of inverting the matrix only works on rotation + translation only matrices)
- assume
tform
is a row-major world matrix identical tocore::matrix4x3SIMD
, as I right in assuming that the multiplication is wrong here due to HLSL quirkyness? - btw the
-
should go on the vector, not the matrix, less components to negate
btw all the |
Merged as part of #475 |
Description
GLSL to HLSL porting
Testing
Run compile.bat to compile all ported HLSL files
TODO list: