Skip to content

NCCL backend from UCC installed in NVHPCSDK #1249

@bellenlau

Description

@bellenlau

Hello,

is the nccl backend of UCC available in the hpcx-mpi installation from nvhpcsdk?

The TL is available according to ucc_info; I load the libraries with

module load /leonardo/prod/opt/compilers/nvhpc/25.3/binary/modulefiles/nvhpc-hpcx-cuda12/25.3
source /leonardo/prod/opt/compilers/nvhpc/25.3/binary/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/hpcx-init.sh
hpcx_load

This installation uses UCC/1.4.3. I checked the availability of the TL with

  • ucc_info -s
Loading /leonardo/prod/opt/compilers/nvhpc/25.3/binary/modulefiles/nvhpc-hpcx-cuda12/25.3
  Loading requirement: hpcx
Default CLs scores: basic=10 hier=50
Default TLs scores: cuda=40 mlx5=1 nccl=20 self=50 sharp=30 shm=100 ucp=10
  • ucc_info -b | grep "nccl"
#define UCC_CONFIGURE_FLAGS       "--with-ucx=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucx --with             -sharp=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/sharp --with-rdmacm --with-tlcp=alltoall_block --             with-cuda=/hpc/local/oss/cuda12.6.3/redhat8 --with-nccl --with-tls=cuda,nccl,self,sharp,shm,ucp,mlx5 --prefix=/build-re             sult/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucc"

At runtime I set

export OMPI_MCA_coll_ucc_enable=1
export OMPI_MCA_coll_ucc_priority=100
export UCC_TL_NCCL_TUNE=allreduce:cuda:inf

But the TL for allreduce is not changed. From --mca coll_ucc_verbose I get always UCP as TL for cuda memory kind:

[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203  UCC  INFO  Allreduce:
[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203  UCC  INFO     Host: {0..4095}:TL_SHM:10 {4K..8K}:TL_SHM:10 {8193..inf}:TL_UCP:10
[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203  UCC  INFO     Cuda: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203  UCC  INFO     CudaManaged: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10

I can report some failures in the initialization part, related to cuda TL:

[1766412746.460863] [lrdn0259:811666:0]         mc_cuda.c:78   cuda mc DEBUG cuCtxGetDevice() failed: invalid device context
...
[1766412746.461583] [lrdn0259:811667:0] tl_cuda_context.c:43   TL_CUDA DEBUG cannot create CUDA TL context without active CUDA context
[1766412746.461589] [lrdn0259:811667:0]     ucc_context.c:412  UCC  DEBUG failed to create tl context for cuda

Could you please give me more information about the error? Should I expect this to be related to the unavailability of nccl tl?

Thank you for your time,

Laura

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions