Skip to content

Introduce DenseTileIntersections and SparseTileIntersections classes for Gaussian rasterization#494

Open
fwilliams wants to merge 5 commits intov0.4from
fw/tile-intersection-classes-v2
Open

Introduce DenseTileIntersections and SparseTileIntersections classes for Gaussian rasterization#494
fwilliams wants to merge 5 commits intov0.4from
fw/tile-intersection-classes-v2

Conversation

@fwilliams
Copy link
Collaborator

Summary

This PR refactors the tile intersection data flow in the Gaussian rasterization pipeline from loose tensors and scalar parameters into well-defined C++ classes (DenseTileIntersections and SparseTileIntersections), and consolidates the four image-dimension scalars (imageWidth, imageHeight, imageOriginW, imageOriginH) into the existing RenderWindow2D struct throughout the autograd layer. The goal is to simplify and clean up the API surface ahead of an eventual migration of the autograd functions to Python.

Motivation

Previously, tile intersection results were passed through the pipeline as a collection of individual tensors (tileOffsets, tileGaussianIds, and for sparse: activeTiles, tilePixelMask, tilePixelCumsum, pixelMap) along with scalar metadata (tileSize, blockOffset, numCameras, etc.). These tensors were:

  1. Unpacked from their dispatch results at the call site in GaussianSplat3d.cpp
  2. Passed individually through autograd forward() function signatures (up to 7 extra parameters)
  3. Re-packed or manually threaded into RasterizeCommonArgs on the other side

This created verbose, fragile function signatures and duplicated data-management logic across the dense, sparse, and from-world autograd paths. The same pattern existed for the four image-dimension scalars which were always passed together.

What Changed

New tile intersection classes (GaussianTileIntersection.h/.cu)

  • DenseTileIntersections: Encapsulates tileOffsets [C, tilesH, tilesW], tileGaussianIds [totalIntersections], and tileSize. Can be constructed either from raw tensors or by computing intersections from 2D Gaussian means/radii/depths.

  • SparseTileIntersections: Extends the dense concept with additional sparse-specific tensors: activeTiles, tilePixelMask, tilePixelCumsum, and pixelMap. Same two construction paths (from tensors or from computation).

  • Both classes expose CUDA Accessor structs (guarded behind __CUDACC__) that provide device-callable helpers:

    • coordinates(blockIdx) — compute camera/tile-row/tile-col from a linear block index
    • tileGaussianRangeFromBlock(blockIdx) — get [start, end) range of gaussian IDs for a tile
    • activePixelIndexFromBlock(blockIdx, threadIdx) — (sparse only) map thread to active pixel
    • pixelIndexFromBlock(blockIdx, threadIdx) — compute global pixel index
    • gaussianIdAt(idx) — look up a gaussian ID from the intersection list
  • A helper dispatchTileIntersectionsAccessor() function template creates the appropriate Accessor from either class, abstracting over the dense vs. sparse distinction at the kernel dispatch level.

Simplified RasterizeCommonArgs (GaussianRasterize.cuh)

  • The struct is now parameterized as RasterizeCommonArgs<ScalarType, NUM_CHANNELS, IS_PACKED, TileIntersectionsT>, where TileIntersectionsT is either DenseTileIntersections::Accessor or SparseTileIntersections::Accessor.

  • Removed ~15 member fields: mBlockOffset, mNumCameras, mTotalIntersections, mTileOriginW/H, mTileSize, mNumTilesW/H, mTileGaussianIds, mTileOffsets, mSparseTileOffsets, mTileOffsetsAreSparse, mIsSparse, mActiveTiles, mTilePixelMask, mTilePixelCumsum, mPixelMap — all replaced by a single mTileIntersections member of the accessor type.

  • Removed redundant accessor methods (renderWidth(), renderHeight(), renderOriginX(), renderOriginY()); callers now use mRenderWindow.width, mRenderWindow.height, etc. directly.

Simplified autograd function signatures

All three autograd function classes were updated:

Function Parameters removed Parameters added
RasterizeGaussiansToPixels imageWidth, imageHeight, imageOriginW, imageOriginH, tileSize, tileOffsets, tileGaussianIds (7 params) renderWindow, tileIntersections (2 params)
RasterizeGaussiansToPixelsSparse imageWidth, imageHeight, imageOriginW, imageOriginH, tileSize, tileOffsets, tileGaussianIds, activeTiles, tilePixelMask, tilePixelCumsum, pixelMap (11 params) renderWindow, tileIntersections (2 params)
RasterizeGaussiansToPixelsFromWorld3DGS imageWidth, imageHeight, imageOriginW, imageOriginH (4 params) renderWindow (1 param)

Updated CUDA kernels

All five rasterization kernel files were updated to use the new Accessor API:

  • GaussianRasterizeForward.cu
  • GaussianRasterizeBackward.cu
  • GaussianRasterizeContributingGaussianIds.cu
  • GaussianRasterizeTopContributingGaussianIds.cu
  • GaussianRasterizeNumContributingGaussians.cu

Host compilation fix

The #include <fvdb/detail/utils/AccessorHelpers.cuh> in GaussianTileIntersection.h was moved behind a #if defined(__CUDACC__) guard so that the header can be safely included from host-only .cpp translation units (the autograd .cpp files now transitively include it).

Updated tests

  • GaussianTileIntersectionTest.cpp — updated to match new dispatch* function signatures
  • GaussianRasterizeForwardTest.cpp — updated tile intersection dispatch call

Files changed (17)

  • src/fvdb/GaussianSplat3d.cpp — construct RenderWindow2D and tile intersection objects at call sites
  • src/fvdb/detail/autograd/GaussianRasterize.{h,cpp} — simplified signature
  • src/fvdb/detail/autograd/GaussianRasterizeFromWorld.{h,cpp} — simplified signature
  • src/fvdb/detail/autograd/GaussianRasterizeSparse.{h,cpp} — simplified signature
  • src/fvdb/detail/ops/gsplat/GaussianRasterize.cuh — templatized + slimmed RasterizeCommonArgs
  • src/fvdb/detail/ops/gsplat/GaussianRasterize{Forward,Backward,ContributingGaussianIds,TopContributingGaussianIds,NumContributingGaussians}.cu — use accessor API
  • src/fvdb/detail/ops/gsplat/GaussianTileIntersection.{h,cu} — new classes + accessors
  • src/tests/Gaussian{TileIntersection,RasterizeForward}Test.cpp — test updates

Test plan

  • C++ unit tests compile and pass (GaussianTileIntersectionTest, GaussianRasterizeForwardTest)
  • Python tests pass (python -m pytest tests/ -v)
  • Full build succeeds with no new warnings
  • Manual smoke test: run a 3DGS training loop to verify rasterization output is unchanged

Made with Cursor

@fwilliams fwilliams requested a review from a team as a code owner March 4, 2026 03:45
@fwilliams fwilliams changed the base branch from main to v0.4 March 6, 2026 18:28
…for Gaussian rasterization

Refactor tile intersection data from loose tensors and scalar parameters into
well-defined C++ classes with CUDA-friendly Accessor structs, consolidating the
rasterization pipeline's data flow.

Key changes:

- Add DenseTileIntersections and SparseTileIntersections classes in
  GaussianTileIntersection.h/.cu that encapsulate tile offsets, gaussian IDs,
  and (for sparse) active tiles, pixel masks, cumsum, and pixel map tensors.

- Each class exposes an inner Accessor struct (under __CUDACC__ guard) with
  device-callable helpers: coordinates(), tileGaussianRangeFromBlock(),
  activePixelIndexFromBlock(), pixelIndexFromBlock(), gaussianIdAt(), etc.

- Refactor RasterizeCommonArgs (GaussianRasterize.cuh) to be parameterized on
  TileIntersectionsT rather than storing raw tile tensors and sparse metadata
  directly. Remove ~15 member fields (mBlockOffset, mNumCameras, mTileOffsets,
  mSparseTileOffsets, mTileGaussianIds, mActiveTiles, mTilePixelMask, etc.)
  in favor of a single mTileIntersections member.

- Remove redundant accessor methods (renderWidth/Height/OriginX/Y) from
  RasterizeCommonArgs; callers now use mRenderWindow fields directly.

- Update all rasterization kernel files (Forward, Backward,
  ContributingGaussianIds, TopContributingGaussianIds,
  NumContributingGaussians) to use the new Accessor API via
  dispatchTileIntersectionsAccessor().

- Simplify autograd function signatures: replace 4 image-dimension scalars
  (imageWidth, imageHeight, imageOriginW, imageOriginH) with a single
  RenderWindow2D parameter, and replace 2-7 tile-related tensor/scalar
  parameters with a single DenseTileIntersections or SparseTileIntersections
  object.

- Update all call sites in GaussianSplat3d.cpp to construct RenderWindow2D and
  tile intersection objects before passing them to autograd::apply().

- Guard the AccessorHelpers.cuh include in GaussianTileIntersection.h behind
  __CUDACC__ so the header can be included from host-only .cpp translation
  units.

- Update C++ tests (GaussianTileIntersectionTest, GaussianRasterizeForwardTest)
  to match the new API signatures.

Signed-off-by: Francis Williams <francis@fwilliams.info>
Made-with: Cursor
Signed-off-by: Francis Williams <francis@fwilliams.info>
…n refactor

- Fix dispatchTileIntersectionsAccessor to correctly handle sparse rendering
  with 3D (dense) tile offsets by checking activeTiles.has_value() instead of
  tileOffsets.dim() to determine sparse vs dense mode
- Fix SparseTileIntersections::Accessor to support hybrid dense/sparse tile
  offsets via a unified constructor
- Fix numCameras derivation in backward validation to use
  cameraCount() from tile intersections instead of means2d.size(0),
  which is totalGaussians in packed mode
- Fix backgrounds and masks size checks in RasterizeCommonArgs to use
  cameraCount() instead of means2d.size(0) for the same reason
- Fix two device-check bugs in GaussianSplat3d.cpp where
  indices.device() was compared against itself instead of mMeans.device()
- Merge duplicate private: sections in SparseTileIntersections::Accessor
- Apply clang-format-18

Signed-off-by: Francis Williams <francis@fwilliams.info>
Made-with: Cursor
Split the overloaded SparseTileIntersections::Accessor into three
distinct types — one per runtime mode — eliminating leaked abstractions,
dummy tensors, and runtime branches in device code:

- DenseTileIntersections::Accessor (Mode 1): dense tiles, dense pixels
- SparseDenseTileIntersections::Accessor (Mode 2): dense tiles, sparse pixels
- SparseTileIntersections::Accessor (Mode 3): sparse tiles, sparse pixels

The ~10-line 3D tile offset lookup is intentionally duplicated between
Dense and SparseDense accessors for readability.

dispatchTileIntersectionsAccessor now has three clean branches with no
dummy tensor allocation in any path.

Signed-off-by: Francis Williams <francis@fwilliams.info>
Made-with: Cursor
@fwilliams fwilliams force-pushed the fw/tile-intersection-classes-v2 branch from a347474 to 8ae6db8 Compare March 6, 2026 19:25
Apply clang-format-18 to files with formatting drift introduced
during the rebase onto main.

Signed-off-by: Francis Williams <francis@fwilliams.info>
Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant