Subgroup2 Benchmark #190

devshgraphicsprogramming · 2025-04-09T20:43:43Z

No description provided.

73_ArithmeticBench/main.cpp

73_ArithmeticBench/app_resources/common.hlsl

73_ArithmeticBench/app_resources/benchmarkSubgroup.comp.hlsl

73_ArithmeticBench/app_resources/shaderCommon.hlsl

73_ArithmeticBench/main.cpp

devshgraphicsprogramming · 2025-04-09T22:26:37Z

73_ArithmeticBench/main.cpp

+		options.spirvOptimizer = nullptr;
+//#ifndef _NBL_DEBUG
+//		ISPIRVOptimizer::E_OPTIMIZER_PASS optPasses = ISPIRVOptimizer::EOP_STRIP_DEBUG_INFO;
+//		auto opt = make_smart_refctd_ptr<ISPIRVOptimizer>(std::span<ISPIRVOptimizer::E_OPTIMIZER_PASS>(&optPasses, 1));
+//		options.spirvOptimizer = opt.get();
+//#endif
+		options.debugInfoFlags |= IShaderCompiler::E_DEBUG_INFO_FLAGS::EDIF_LINE_BIT;


you must use zero debug flags and a SPIR-V optimizer to get representable perf, just ask @Fletterio about his FFT examples

It's in the createShader method in main.cpp of example 28, but to get a standard optimizer + strip debug info you just provide an optimizer with a single strip debug info flag to the compiler.

I don't remember exactly whether this invokes other optimizations or just the strip debug

It should invoke the standard passes specified in the SPIRV compiler repo when you run with -O, but in that regard I think the intent of the optimizer is busted (providing a custom optimizer likely is intended to disable all other passes by default)

@keptsecret you correctly optimize and strip bedug info in 23 the test, but not ex 29 the benchmark?

@keptsecret bump

73_ArithmeticBench/main.cpp

devshgraphicsprogramming · 2025-04-09T22:38:07Z

73_ArithmeticBench/main.cpp

+		// barrier transition to GENERAL
+		{
+			IGPUCommandBuffer::SPipelineBarrierDependencyInfo::image_barrier_t imageBarriers[1];
+			imageBarriers[0].barrier = {
+				   .dep = {
+					   .srcStageMask = PIPELINE_STAGE_FLAGS::NONE,
+					   .srcAccessMask = ACCESS_FLAGS::NONE,
+					   .dstStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT,
+					   .dstAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS
+					}
+			};
+			imageBarriers[0].image = dummyImg.get();
+			imageBarriers[0].subresourceRange = {
+				.aspectMask = IImage::EAF_COLOR_BIT,
+				.baseMipLevel = 0u,
+				.levelCount = 1u,
+				.baseArrayLayer = 0u,
+				.layerCount = 1u
+			};
+			imageBarriers[0].oldLayout = IImage::LAYOUT::UNDEFINED;
+			imageBarriers[0].newLayout = IImage::LAYOUT::GENERAL;
+
+			cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imageBarriers });
+		}


if you don't actually touch the image, you don't need to transition it (you may need to transition right after creation / first frame so validation layer doesn't complain about it being in UNDEFINED layout)

@keptsecret do you need the pipeline barriers for Nsight to capture?

73_ArithmeticBench/main.cpp

devshgraphicsprogramming · 2025-04-09T22:40:21Z

73_ArithmeticBench/main.cpp

+		// barrier transition to PRESENT
+		{
+			IGPUCommandBuffer::SPipelineBarrierDependencyInfo::image_barrier_t imageBarriers[1];
+			imageBarriers[0].barrier = {
+				   .dep = {
+					   .srcStageMask = PIPELINE_STAGE_FLAGS::COMPUTE_SHADER_BIT,
+					   .srcAccessMask = ACCESS_FLAGS::SHADER_WRITE_BITS,
+					   .dstStageMask = PIPELINE_STAGE_FLAGS::NONE,
+					   .dstAccessMask = ACCESS_FLAGS::NONE
+					}
+			};
+			imageBarriers[0].image = m_surface->getSwapchainResources()->getImage(m_currentImageAcquire.imageIndex);
+			imageBarriers[0].subresourceRange = {
+				.aspectMask = IImage::EAF_COLOR_BIT,
+				.baseMipLevel = 0u,
+				.levelCount = 1u,
+				.baseArrayLayer = 0u,
+				.layerCount = 1u
+			};
+			imageBarriers[0].oldLayout = IImage::LAYOUT::TRANSFER_DST_OPTIMAL;
+			imageBarriers[0].newLayout = IImage::LAYOUT::PRESENT_SRC;
+
+			cmdbuf->pipelineBarrier(E_DEPENDENCY_FLAGS::EDF_NONE, { .imgBarriers = imageBarriers });
+		}


transition once to PRESENT and never touch later on

let me know if nsight needs some barrier usage of the swapchain in order to capture, I don't think it does

73_ArithmeticBench/main.cpp

73_ArithmeticBench/imgui.ini

keptsecret · 2025-05-07T08:13:49Z

Should close, replace with #192

keptsecret added 11 commits March 27, 2025 15:26

initial benchmark example copy

8090a2d

test subgroup2 funcs correct

3a2ff14

fix test

dd021a0

benchmarking shader + pipeline working

ca21941

begin adding fake frames for nsight profiler

0bb41db

merge master, fix conflicts

24a93bb

re-numbered example to avoid duplicate

17dda8e

fake frames for nsight

3d4e0f2

use correct shader, spirv line dbinfo for nsight

0192999

support for 1 item per invoc

8c9d55e

handle when items per invoc =1

07d6980