Description
Description
I installed mxnet and Horovod by through source from here When I run a simple program to test the Horovod environment, I got a "segmentation fault" error.
Environment info (Required)
----------Python Info----------
('Version :', '2.7.13')
('Compiler :', 'GCC 4.4.7 20120313 (Red Hat 4.4.7-1)')
('Build :', ('default', 'Dec 20 2016 23:09:15'))
('Arch :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version :', '9.0.1')
('Directory :', '/home/anaconda2/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
('Version :', '1.5.0')
('Directory :', '/home/horovod/mxnet/python/mxnet')
Hashtag not found. Not installed from pre-built package.
----------System Info----------
('Platform :', 'Linux-3.10.0-327.el7.x86_64-x86_64-with-redhat-7.2-Maipo')
('system :', 'Linux')
('release :', '3.10.0-327.el7.x86_64')
('version :', '#1 SMP Thu Oct 29 17:29:29 EDT 2015')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Package used (Python/R/Scala/Julia):
I'm using Python.
Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio):gcc
MXNet commit hash:
No
Error Message:
(Paste the complete error message, including stack trace.)
No error information came out but segmentation fault .
Minimum reproducible example
(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)
I wrote a simple Horovod test program named 'test.py' and it is shown below.
import numpy as np
import mxnet as mx
import horovod.mxnet as hvd
hvd.init()
r=int(hvd.rank())
print("r:", r)
x=mx.nd.ones((2,3,4), dtype=np.float16)
print("x:", x)
y=hvd.allreduce(x)
print("y", y)
Steps to reproduce
(Paste the commands you ran that produced the error.)
Only a simple commands on the terminal python test.py
What have you tried to solve it?
I located where the segmentation error cames. It is because of the allreduce
function. Besides, the function 'broadcast_parameters' would also cause segmentation fault.
Could someone help me fix it? Thanks in advance!
Activity