Skip to content

accelerator framework/cuda: still not entirely fixed #11354

Closed
@hppritcha

Description

@hppritcha

I was checking out head of the v5.0.x branch in high expectations that it would work well on our nvidia + HPE SS11 (aka libfabric) system, but alas, if my application doesn't use cudA, yet is linked against a ompi v5.0.x with all the recent accelerator/cuda changes in place, and configured for CUDA support, things don't work right.

Hello, world, I am 1 of 2, (Open MPI v5.0.0rc9, package: Open MPI hpp@ch-fe1 Distribution, ident: 5.0.0rc9, repo rev: v5.0.0rc9-287-g5d87f3e6, Unreleased developer copy, 141)
Hello, world, I am 0 of 2, (Open MPI v5.0.0rc9, package: Open MPI hpp@ch-fe1 Distribution, ident: 5.0.0rc9, repo rev: v5.0.0rc9-287-g5d87f3e6, Unreleased developer copy, 141)
--------------------------------------------------------------------------
The call to cuEventDestory failed. This is a unrecoverable error and will
cause the program to abort.
  cuEventDestory return value:   709
Check the cuda.h file for what the return value means.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuEventDestory failed. This is a unrecoverable error and will
cause the program to abort.
  cuEventDestory return value:   709
Check the cuda.h file for what the return value means.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuEventDestory failed. This is a unrecoverable error and will
cause the program to abort.
  cuEventDestory return value:   709
Check the cuda.h file for what the return value means.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuEventDestory failed. This is a unrecoverable error and will
cause the program to abort.
  cuEventDestory return value:   709
Check the cuda.h file for what the return value means.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuEventDestory failed. This is a unrecoverable error and will
cause the program to abort.
  cuEventDestory return value:   709
Check the cuda.h file for what the return value means.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuEventDestory failed. This is a unrecoverable error and will
cause the program to abort.
  cuEventDestory return value:   709
Check the cuda.h file for what the return value means.

It looks like holes may have been plugged for OB1 (if i set the pml to use ob1 I don't see these messages), but such is not the case when using other PMLs apparently.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions