-
Notifications
You must be signed in to change notification settings - Fork 870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building version v5.0.6 with CUDA fails #12924
Comments
SC24 travel day, can't do more than this: diff --git a/ompi/mca/coll/cuda/coll_cuda.h b/ompi/mca/coll/cuda/coll_cuda.h
index afedc632ee..4b3ecc647e 100644
--- a/ompi/mca/coll/cuda/coll_cuda.h
+++ b/ompi/mca/coll/cuda/coll_cuda.h
@@ -54,7 +54,7 @@ int mca_coll_cuda_reduce(const void *sbuf, void *rbuf, int count,
struct ompi_communicator_t *comm,
mca_coll_base_module_t *module);
-int mca_coll_cuda_reduce_local(const void *sbuf, void *rbuf, size_t count,
+int mca_coll_cuda_reduce_local(const void *sbuf, void *rbuf, int count,
struct ompi_datatype_t *dtype,
struct ompi_op_t *op,
mca_coll_base_module_t *module);
diff --git a/ompi/mca/coll/cuda/coll_cuda_reduce.c b/ompi/mca/coll/cuda/coll_cuda_reduce.c
index 7743a07874..e165a1d9bb 100644
--- a/ompi/mca/coll/cuda/coll_cuda_reduce.c
+++ b/ompi/mca/coll/cuda/coll_cuda_reduce.c
@@ -85,7 +85,7 @@ mca_coll_cuda_reduce(const void *sbuf, void *rbuf, int count,
}
int
-mca_coll_cuda_reduce_local(const void *sbuf, void *rbuf, size_t count,
+mca_coll_cuda_reduce_local(const void *sbuf, void *rbuf, int count,
struct ompi_datatype_t *dtype,
struct ompi_op_t *op,
mca_coll_base_module_t *module) |
did we not backport the cuda compilation check to the 5.0.x branch? That should/would have caught the issue |
@lahwaacz back from SC - does the above patch work for you? For now we have to consider v5.0.6 broken for cuda builds, I'll push an rc1 asap. with a release week after "hopefully" |
@janjust The patch makes the build pass, but I have no idea if the patched release actually works - obviously nobody tested it yet with cuda... |
So that's the thing, I was about to ask on slack/general But for some odd reason, the v5.0.6 builds coll/cuda just fine for me.
And this is using the source tarball |
@lahwaacz what compiler are you using, just curious |
nvm, I think I got it, @bosilca pointed out it's due to the -Werror flag missing |
fixed with #12934 |
@janjust Thanks! Will you make a new release with the fix? |
I'm hoping to cut an RC today or tomorrow; however, we're also investigating an osc issue which is currently blocking release. I'm afraid I don't have an eta. |
Building OpenMPI version v5.0.6 with CUDA fails with the following error:
The text was updated successfully, but these errors were encountered: