Skip to content

Segmentation fault in horovod.allreduce() and horovod.broadcast_parameters() functions #7

Open
@TonyTangYu

Description

Description

I installed mxnet and Horovod by through source from here When I run a simple program to test the Horovod environment, I got a "segmentation fault" error.

Environment info (Required)

----------Python Info----------
('Version      :', '2.7.13')
('Compiler     :', 'GCC 4.4.7 20120313 (Red Hat 4.4.7-1)')
('Build        :', ('default', 'Dec 20 2016 23:09:15'))
('Arch         :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version      :', '9.0.1')
('Directory    :', '/home/anaconda2/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
('Version      :', '1.5.0')
('Directory    :', '/home/horovod/mxnet/python/mxnet')
Hashtag not found. Not installed from pre-built package.
----------System Info----------
('Platform     :', 'Linux-3.10.0-327.el7.x86_64-x86_64-with-redhat-7.2-Maipo')
('system       :', 'Linux')
('release      :', '3.10.0-327.el7.x86_64')
('version      :', '#1 SMP Thu Oct 29 17:29:29 EDT 2015')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6

Package used (Python/R/Scala/Julia):
I'm using Python.

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):gcc

MXNet commit hash:
No

Error Message:

(Paste the complete error message, including stack trace.)
No error information came out but segmentation fault .

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)
I wrote a simple Horovod test program named 'test.py' and it is shown below.

import numpy as np
import mxnet as mx
import horovod.mxnet as hvd
hvd.init()
r=int(hvd.rank())
print("r:", r)
x=mx.nd.ones((2,3,4), dtype=np.float16)
print("x:", x)
y=hvd.allreduce(x)
print("y", y)

Steps to reproduce

(Paste the commands you ran that produced the error.)
Only a simple commands on the terminal python test.py

What have you tried to solve it?

I located where the segmentation error cames. It is because of the allreduce function. Besides, the function 'broadcast_parameters' would also cause segmentation fault.
Could someone help me fix it? Thanks in advance!

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions