Introduce DenseTileIntersections and SparseTileIntersections classes for Gaussian rasterization#494
Open
Introduce DenseTileIntersections and SparseTileIntersections classes for Gaussian rasterization#494
Conversation
…for Gaussian rasterization Refactor tile intersection data from loose tensors and scalar parameters into well-defined C++ classes with CUDA-friendly Accessor structs, consolidating the rasterization pipeline's data flow. Key changes: - Add DenseTileIntersections and SparseTileIntersections classes in GaussianTileIntersection.h/.cu that encapsulate tile offsets, gaussian IDs, and (for sparse) active tiles, pixel masks, cumsum, and pixel map tensors. - Each class exposes an inner Accessor struct (under __CUDACC__ guard) with device-callable helpers: coordinates(), tileGaussianRangeFromBlock(), activePixelIndexFromBlock(), pixelIndexFromBlock(), gaussianIdAt(), etc. - Refactor RasterizeCommonArgs (GaussianRasterize.cuh) to be parameterized on TileIntersectionsT rather than storing raw tile tensors and sparse metadata directly. Remove ~15 member fields (mBlockOffset, mNumCameras, mTileOffsets, mSparseTileOffsets, mTileGaussianIds, mActiveTiles, mTilePixelMask, etc.) in favor of a single mTileIntersections member. - Remove redundant accessor methods (renderWidth/Height/OriginX/Y) from RasterizeCommonArgs; callers now use mRenderWindow fields directly. - Update all rasterization kernel files (Forward, Backward, ContributingGaussianIds, TopContributingGaussianIds, NumContributingGaussians) to use the new Accessor API via dispatchTileIntersectionsAccessor(). - Simplify autograd function signatures: replace 4 image-dimension scalars (imageWidth, imageHeight, imageOriginW, imageOriginH) with a single RenderWindow2D parameter, and replace 2-7 tile-related tensor/scalar parameters with a single DenseTileIntersections or SparseTileIntersections object. - Update all call sites in GaussianSplat3d.cpp to construct RenderWindow2D and tile intersection objects before passing them to autograd::apply(). - Guard the AccessorHelpers.cuh include in GaussianTileIntersection.h behind __CUDACC__ so the header can be included from host-only .cpp translation units. - Update C++ tests (GaussianTileIntersectionTest, GaussianRasterizeForwardTest) to match the new API signatures. Signed-off-by: Francis Williams <francis@fwilliams.info> Made-with: Cursor
Signed-off-by: Francis Williams <francis@fwilliams.info>
…n refactor - Fix dispatchTileIntersectionsAccessor to correctly handle sparse rendering with 3D (dense) tile offsets by checking activeTiles.has_value() instead of tileOffsets.dim() to determine sparse vs dense mode - Fix SparseTileIntersections::Accessor to support hybrid dense/sparse tile offsets via a unified constructor - Fix numCameras derivation in backward validation to use cameraCount() from tile intersections instead of means2d.size(0), which is totalGaussians in packed mode - Fix backgrounds and masks size checks in RasterizeCommonArgs to use cameraCount() instead of means2d.size(0) for the same reason - Fix two device-check bugs in GaussianSplat3d.cpp where indices.device() was compared against itself instead of mMeans.device() - Merge duplicate private: sections in SparseTileIntersections::Accessor - Apply clang-format-18 Signed-off-by: Francis Williams <francis@fwilliams.info> Made-with: Cursor
Split the overloaded SparseTileIntersections::Accessor into three distinct types — one per runtime mode — eliminating leaked abstractions, dummy tensors, and runtime branches in device code: - DenseTileIntersections::Accessor (Mode 1): dense tiles, dense pixels - SparseDenseTileIntersections::Accessor (Mode 2): dense tiles, sparse pixels - SparseTileIntersections::Accessor (Mode 3): sparse tiles, sparse pixels The ~10-line 3D tile offset lookup is intentionally duplicated between Dense and SparseDense accessors for readability. dispatchTileIntersectionsAccessor now has three clean branches with no dummy tensor allocation in any path. Signed-off-by: Francis Williams <francis@fwilliams.info> Made-with: Cursor
a347474 to
8ae6db8
Compare
Apply clang-format-18 to files with formatting drift introduced during the rebase onto main. Signed-off-by: Francis Williams <francis@fwilliams.info> Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR refactors the tile intersection data flow in the Gaussian rasterization pipeline from loose tensors and scalar parameters into well-defined C++ classes (
DenseTileIntersectionsandSparseTileIntersections), and consolidates the four image-dimension scalars (imageWidth,imageHeight,imageOriginW,imageOriginH) into the existingRenderWindow2Dstruct throughout the autograd layer. The goal is to simplify and clean up the API surface ahead of an eventual migration of the autograd functions to Python.Motivation
Previously, tile intersection results were passed through the pipeline as a collection of individual tensors (
tileOffsets,tileGaussianIds, and for sparse:activeTiles,tilePixelMask,tilePixelCumsum,pixelMap) along with scalar metadata (tileSize,blockOffset,numCameras, etc.). These tensors were:GaussianSplat3d.cppforward()function signatures (up to 7 extra parameters)RasterizeCommonArgson the other sideThis created verbose, fragile function signatures and duplicated data-management logic across the dense, sparse, and from-world autograd paths. The same pattern existed for the four image-dimension scalars which were always passed together.
What Changed
New tile intersection classes (
GaussianTileIntersection.h/.cu)DenseTileIntersections: EncapsulatestileOffsets[C, tilesH, tilesW],tileGaussianIds[totalIntersections], andtileSize. Can be constructed either from raw tensors or by computing intersections from 2D Gaussian means/radii/depths.SparseTileIntersections: Extends the dense concept with additional sparse-specific tensors:activeTiles,tilePixelMask,tilePixelCumsum, andpixelMap. Same two construction paths (from tensors or from computation).Both classes expose CUDA
Accessorstructs (guarded behind__CUDACC__) that provide device-callable helpers:coordinates(blockIdx)— compute camera/tile-row/tile-col from a linear block indextileGaussianRangeFromBlock(blockIdx)— get [start, end) range of gaussian IDs for a tileactivePixelIndexFromBlock(blockIdx, threadIdx)— (sparse only) map thread to active pixelpixelIndexFromBlock(blockIdx, threadIdx)— compute global pixel indexgaussianIdAt(idx)— look up a gaussian ID from the intersection listA helper
dispatchTileIntersectionsAccessor()function template creates the appropriateAccessorfrom either class, abstracting over the dense vs. sparse distinction at the kernel dispatch level.Simplified
RasterizeCommonArgs(GaussianRasterize.cuh)The struct is now parameterized as
RasterizeCommonArgs<ScalarType, NUM_CHANNELS, IS_PACKED, TileIntersectionsT>, whereTileIntersectionsTis eitherDenseTileIntersections::AccessororSparseTileIntersections::Accessor.Removed ~15 member fields:
mBlockOffset,mNumCameras,mTotalIntersections,mTileOriginW/H,mTileSize,mNumTilesW/H,mTileGaussianIds,mTileOffsets,mSparseTileOffsets,mTileOffsetsAreSparse,mIsSparse,mActiveTiles,mTilePixelMask,mTilePixelCumsum,mPixelMap— all replaced by a singlemTileIntersectionsmember of the accessor type.Removed redundant accessor methods (
renderWidth(),renderHeight(),renderOriginX(),renderOriginY()); callers now usemRenderWindow.width,mRenderWindow.height, etc. directly.Simplified autograd function signatures
All three autograd function classes were updated:
RasterizeGaussiansToPixelsimageWidth,imageHeight,imageOriginW,imageOriginH,tileSize,tileOffsets,tileGaussianIds(7 params)renderWindow,tileIntersections(2 params)RasterizeGaussiansToPixelsSparseimageWidth,imageHeight,imageOriginW,imageOriginH,tileSize,tileOffsets,tileGaussianIds,activeTiles,tilePixelMask,tilePixelCumsum,pixelMap(11 params)renderWindow,tileIntersections(2 params)RasterizeGaussiansToPixelsFromWorld3DGSimageWidth,imageHeight,imageOriginW,imageOriginH(4 params)renderWindow(1 param)Updated CUDA kernels
All five rasterization kernel files were updated to use the new
AccessorAPI:GaussianRasterizeForward.cuGaussianRasterizeBackward.cuGaussianRasterizeContributingGaussianIds.cuGaussianRasterizeTopContributingGaussianIds.cuGaussianRasterizeNumContributingGaussians.cuHost compilation fix
The
#include <fvdb/detail/utils/AccessorHelpers.cuh>inGaussianTileIntersection.hwas moved behind a#if defined(__CUDACC__)guard so that the header can be safely included from host-only.cpptranslation units (the autograd.cppfiles now transitively include it).Updated tests
GaussianTileIntersectionTest.cpp— updated to match newdispatch*function signaturesGaussianRasterizeForwardTest.cpp— updated tile intersection dispatch callFiles changed (17)
src/fvdb/GaussianSplat3d.cpp— constructRenderWindow2Dand tile intersection objects at call sitessrc/fvdb/detail/autograd/GaussianRasterize.{h,cpp}— simplified signaturesrc/fvdb/detail/autograd/GaussianRasterizeFromWorld.{h,cpp}— simplified signaturesrc/fvdb/detail/autograd/GaussianRasterizeSparse.{h,cpp}— simplified signaturesrc/fvdb/detail/ops/gsplat/GaussianRasterize.cuh— templatized + slimmedRasterizeCommonArgssrc/fvdb/detail/ops/gsplat/GaussianRasterize{Forward,Backward,ContributingGaussianIds,TopContributingGaussianIds,NumContributingGaussians}.cu— use accessor APIsrc/fvdb/detail/ops/gsplat/GaussianTileIntersection.{h,cu}— new classes + accessorssrc/tests/Gaussian{TileIntersection,RasterizeForward}Test.cpp— test updatesTest plan
GaussianTileIntersectionTest,GaussianRasterizeForwardTest)python -m pytest tests/ -v)Made with Cursor