-
Notifications
You must be signed in to change notification settings - Fork 901
Add MCA parameters to define the size of memcpy chunks. #6426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/5727f2c905cac5b1988a2e9002bbb689 |
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/193695e49cab6e5958b7faed31df0ad0 |
4e53104
to
934ba12
Compare
@bosilca Are these extra merge commits a mistake? |
Add support for vector copy, allowing the upper level to define specialized/optimized vector copy functions. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
1811ead
to
ff3b07f
Compare
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/fa5ed43a7024677b81f239e56865960a |
@bosilca This seems like an important improvement. Do you think this will make v5.0? |
Adding this to 5.0 milestone for tracking. @bosilca if you want this for 5.0, please target getting it in by end of April. Thanks. |
Ping - @bosilca to make 5.0 can you have this merged by April 30th? |
@bosilca Is this still desirable before v5.0 branches? |
Can one of the admins verify this patch? |
@bosilca This PR now has conflicts. If this PR is still desired, can you fix the conflicts? Thanks! |
@bosilca Is this still desired for v5.0? If so, can you please rebase and retest? |
No time in the near future to complete this work, plus the use case we had disappeared when we delegated all our communication support to UCX. I'll will keep this PR around until I find some time to revisit the topic. |
Delayed to v6.0 due to resources. |
@Akshay-Venkatesh Can you please take a look at this? (From @jladd-mlnx) |
@bosilca I removed the critical label, based on your comment |
CUDA memcpys are divided in many very small chunks (a fixed size of 128k), leading to terrible performance. To improve performance we need to increase this limit (maybe even remove it), and add support for vector CUDA memcpy support.
Add support for vector copy, allowing the upper level to define
specialized/optimized vector copy functions.
Signed-off-by: George Bosilca bosilca@icl.utk.edu