Pitched arrays

The CUDA driver/runtime APIs have a `cuMallocPitch`, returning a pitch that's larger than the size of a row in order to include padding. See https://stackoverflow.com/a/16119944/587034; this can improve performance due to better memory load behavior. Although https://forums.developer.nvidia.com/t/what-is-the-stream-ordered-equivalent-of-cudamallocpitch/189574 may suggest this isn't as relevant on today's hardware anymore, it would be an interesting experiment.

I'm not sure whether we should add a separate `CuPitchedArray` type, or whether we can generalize `CuArray` without penalizing every array access (can `CuDeviceArray` just contain the per-dimension stride instead of sizes + strides?). Either way, we should probably have a way to dispatch to the 2d/3d memcpy's, if possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pitched arrays #1208

maleadt
openedon Oct 19, 2021

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pitched arrays #1208

Description

maleadtopenedon Oct 19, 2021

Metadata