Skip to content

horovod installation fails in docker build CI #12209

@akihironitta

Description

@akihironitta

🐛 Bug

The following jobs are failing in master due to the failure in installing horovod.

build-CUDA (3.7, 1.8)
build-Conda (3.8, 1.7)
build-Conda (3.8, 1.8)
build-Conda (3.8, 1.9)
build-Conda (3.8, 1.10)
#8 331.1 × Encountered error while trying to install package.
#8 331.1 ╰─> horovod

Full log here: https://gist.github.com/akihironitta/ee8f5895d444e918ce825562e1e00402

To Reproduce

e.g.

docker buildx build --build-arg PYTHON_VERSION=3.7 --build-arg PYTORCH_VERSION=1.8 --file dockers/base-cuda/Dockerfile .

Expected behavior

horovod is installed successfully.

Environment

  • PyTorch Lightning Version (e.g., 1.5.0):
  • PyTorch Version (e.g., 1.10):
  • Python version (e.g., 3.9):
  • OS (e.g., Linux):
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

cc @tchaton @rohitgr7 @akihironitta @carmocca @Borda

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingciContinuous Integrationpriority: 0High priority task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions