Skip to content

Conversation

@AboorvaDevarajan
Copy link
Member

Use the pre-populated convertor flag CONVERTOR_CUDA to check the type of buffer instead of calling cuPointerGetAttribute again which incurs additional overhead for CPU buffers.

Signed-off-by: Aboorva Devarajan abodevar@in.ibm.com

Use the pre populated convertor flag `CONVERTOR_CUDA` to check the type of buffer
instead of calling `cuPointerGetAttribute` again which incurs additional overhead
for CPU buffers.

Signed-off-by: Aboorva Devarajan <abodevar@in.ibm.com>
@AboorvaDevarajan
Copy link
Member Author

There is one semantic difference which is introduced in this patch, when using CONVERTOR_CUDA flag just the source buffer for pack and the receive buffer for unpack is checked, where as in opal_cuda_check_bufs both the source and destination buffers are checked for pack and unpack, as far as I verified this should not break something.

But is there is a specific reason why opal_cuda_check_bufs instead of CUDA_CONVERTOR is used in this case? Thanks.

Copy link
Contributor

@awlauria awlauria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I'd wait for @markalle's thoughts before merging

@markalle
Copy link
Contributor

Thanks, this is a good improvement. I like the idea of having that data cached, and this seems to be how OMPI is trying to cache the data.

After looking at how OMPI sets the CONVERTER_CUDA flag I don't actually think OMPI's cuda cuda check is legitimate, so I agree with making this change, and then maybe separately we should argue about the correctness of the CONVERTOR_CUDA flag in general throughout OMPI

@awlauria awlauria merged commit 5585b03 into open-mpi:master Mar 11, 2021
@awlauria
Copy link
Contributor

@AboorvaDevarajan can you cherry-pick over to v5.0.x?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants