Skip to content

Conversation

@eappen-nelluvelil
Copy link
Contributor

@eappen-nelluvelil eappen-nelluvelil commented Dec 16, 2025

In this PR, version 1 of the device CBC sweep chunk kernel has been implemented.

The following are the major changes:

  • There is a new scheduling algorithm for device CBC sweeps called ASYNC_FIFO. This algorithm sweeps asynchronously over all anglesets in a given groupset via crb::Stream's. The main while-loop loops over all anglesets and checks for cells that can be swept. The ready cells for a given angleset are batched and given to the device CBC sweep chunk kernel, and is executed asynchronously. This is parallel over a given set of ready cells and for all groups in a given groupset. The CBC_AsynchronousCommunicator's still receives and sends upwind and downwind angular fluxes, respectively, as soon as possible, which mirrors the host CBC sweep chunk kernel.
  • The CBCD_FLUDS class contains host and device buffers for boundary, local, and non-local angular flux data. Local cell angular flux data is kept on the device. CBCD_FLUDS provides methods for asynchronously copying incoming/outgoing boundary and non-local cell angular flux data between host and device.
  • The CBCD_FLUDSCommonData class contains similar functionality as in the AAHD_FLUDSCommonData class for encoding/decoding the appropriate locations that cell face nodes need to be read from and written to in the local, boundary, and non-local host/device buffers.
  • The device CBC sweep chunk kernel is templated on the number of cell spatial DOFs. The functionality of the device kernel is structured in a similar way as the device AAH sweep chunk kernel.

The device CBC sweep chunk kernel passes all of the GPU regression tests, save for the transport_3d_4_cycles_1_gpu.py. This is because the CBC sweep algorithm does not handle cyclic dependencies.

@eappen-nelluvelil
Copy link
Contributor Author

@wdarylhawkins @quocdang1998 Requesting review for this PR.

Copy link
Collaborator

@andrsd andrsd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First rough pass through the PR.

@eappen-nelluvelil eappen-nelluvelil force-pushed the cbc-gpu-sweep-chunk-2 branch 7 times, most recently from f153101 to f186816 Compare December 18, 2025 23:18
@eappen-nelluvelil eappen-nelluvelil force-pushed the cbc-gpu-sweep-chunk-2 branch 6 times, most recently from 664d8c2 to 6593aa0 Compare January 15, 2026 19:19
@eappen-nelluvelil
Copy link
Contributor Author

@quocdang1998 Made the following changes

  • AAHD_NodeIndex and CBCD_NodeIndex classes inherit from DeviceNodeIndex class in device_structs.h
  • AAHD_FLUDSPointerSet and CBCD_FLUDSPointerSet classes inherit from Device_FLUDSPointerSet in device_structs.h
  • GPU solver kernels are templated on DeviceNodeIndex and Device_FLUDSPointerSet classes; CBC-specific functionality now contained in the cbc_gpu_kernel namespace
  • Simplified CBCD_FLUDS class methods, and incoming/outgoing boundary and non-local node lookup maps are now built in CBCD_FLUDSCommonData

The device CBC sweep chunk kernel passes all GPU regression tests (confirmed by setting sweep_type="CBC" in these tests).

The code is still failing one clang-tidy check:

/opt/local/opensn/clang/21.1.0/dependencies/include/mpicpp-lite/impl/Request.h:37:58: error: Assigned value is uninitialized [clang-analyzer-core.uninitialized.Assign,-warnings-as-errors]
   37 | inline Request::Request(const MPI_Request & r) : request(r) {}

Using // NOLINT doesn't suppress this clang-tidy error.

Requesting re-review.

@eappen-nelluvelil eappen-nelluvelil force-pushed the cbc-gpu-sweep-chunk-2 branch 9 times, most recently from 1503c32 to 1f48e1b Compare January 29, 2026 04:28
@eappen-nelluvelil
Copy link
Contributor Author

eappen-nelluvelil commented Jan 29, 2026

@andrsd I haven't been able to resolve this clang-tidy error that pops up in cbc_async_comm.cc:

/opt/local/opensn/clang/21.1.0/dependencies/include/mpicpp-lite/impl/Request.h:37:58: error: Assigned value is uninitialized [clang-analyzer-core.uninitialized.Assign,-warnings-as-errors]
   37 | inline Request::Request(const MPI_Request & r) : request(r) {}
      |                                                          ^
/home/dhawkins/actions-runner/_work/opensn/opensn/modules/linear_boltzmann_solvers/discrete_ordinates_problem/sweep/communicators/cbc_async_comm.cc:38:7: note: Assuming the condition is false
   38 |   if (not outgoing_message_queue_.empty())

The only change I made to the CBC_AsynchronousCommunicator class is fixing the incorrect capitalization of the "s" in the name (CBC_ASynchronousCommunicator to CBC_AsynchronousCommunicator).

Do you have any ideas as to why clang-tidy is throwing this error?

Major changes:
- Add `CBCD_FLUDSCommonData`, `CBCD_FLUDS` classes to index into and
  store local, boundary, and non-local angular flux buffers
- Add `CBCD_AngleSet`, `CBCD_SweepChunk` classes to asynchronously sweep over
  ready cells via caribou streams
- `ASYNC_FIFO` scheduling algorithm to asynchronously sweep over all
  anglesets in a groupset's angle aggregation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants