Skip to content

Pitched arrays #1208

Open
Open

Description

The CUDA driver/runtime APIs have a cuMallocPitch, returning a pitch that's larger than the size of a row in order to include padding. See https://stackoverflow.com/a/16119944/587034; this can improve performance due to better memory load behavior. Although https://forums.developer.nvidia.com/t/what-is-the-stream-ordered-equivalent-of-cudamallocpitch/189574 may suggest this isn't as relevant on today's hardware anymore, it would be an interesting experiment.

I'm not sure whether we should add a separate CuPitchedArray type, or whether we can generalize CuArray without penalizing every array access (can CuDeviceArray just contain the per-dimension stride instead of sizes + strides?). Either way, we should probably have a way to dispatch to the 2d/3d memcpy's, if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    cuda arrayStuff about CuArray.enhancementNew feature or requestperformanceHow fast can we go?speculativeNot sure about this one yet.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions