Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Seg Fault while using Randomized relu activation function #14447

Open
anirudhacharya opened this issue Mar 16, 2019 · 7 comments · May be fixed by #14582
Open

Seg Fault while using Randomized relu activation function #14447

anirudhacharya opened this issue Mar 16, 2019 · 7 comments · May be fixed by #14582
Labels
Backend Issues related to the backend of MXNet Bug Operator

Comments

@anirudhacharya
Copy link
Member

import mxnet as mx
import numpy as np
from collections import namedtuple

Batch = namedtuple('Batch', ['data'])
data = mx.sym.Variable('data')
out = mx.sym.LeakyReLU(data=data, act_type='rrelu')
mod = mx.mod.Module(symbol=out, label_names=None)
mod.bind(data_shapes=[('data', (1, 10))])
mod.init_params()

data1 = [mx.nd.ones((1, 10))]
mod.forward(Batch(data1))
print(mod.get_outputs()[0].asnumpy())

Using rrelu activation type of the LeakyRelu operator I either get a seg fault or it errors out with the following stack trace -

Traceback (most recent call last):
  File "/Users/aanirud/Code/scripts/bug.py", line 15, in <module>
    print(mod.get_outputs()[0].asnumpy())
  File "/Users/aanirud/anaconda2/envs/mxnet2.7/lib/python2.7/site-packages/mxnet-1.5.0-py2.7.egg/mxnet/ndarray/ndarray.py", line 1995, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/Users/aanirud/anaconda2/envs/mxnet2.7/lib/python2.7/site-packages/mxnet-1.5.0-py2.7.egg/mxnet/base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [21:18:55] include/mxnet/./resource.h:155: Check failed: req.type == ResourceRequest::kTempSpace (459100160 vs. 1) 

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.so                         0x00000001063f0034 dmlc::StackTrace() + 276
[bt] (1) 1   libmxnet.so                         0x00000001063efdef dmlc::LogMessageFatal::~LogMessageFatal() + 47
[bt] (2) 2   libmxnet.so                         0x0000000106855685 mshadow::Tensor<mshadow::cpu, 1, unsigned int> mxnet::Resource::get_space_typed<mshadow::cpu, 1, unsigned int>(mshadow::Shape<1>, mshadow::Stream<mshadow::cpu>*) const + 277
[bt] (3) 3   libmxnet.so                         0x0000000107aa667e mxnet::op::LeakyReLUOp<mshadow::cpu, float>::Forward(mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&) + 894
[bt] (4) 4   libmxnet.so                         0x0000000107a16283 mxnet::op::OperatorState::Forward(mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&) + 1795
[bt] (5) 5   libmxnet.so                         0x0000000107871cc7 mxnet::exec::StatefulComputeExecutor::Run(mxnet::RunContext, bool) + 87
[bt] (6) 6   libmxnet.so                         0x000000010789d105 std::__1::__function::__func<mxnet::exec::GraphExecutor::CreateCachedSegOpr(unsigned long, unsigned long)::$_7, std::__1::allocator<mxnet::exec::GraphExecutor::CreateCachedSegOpr(unsigned long, unsigned long)::$_7>, void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>::operator()(mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&) + 117
[bt] (7) 7   libmxnet.so                         0x0000000107865cdc mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*) + 652
[bt] (8) 8   libmxnet.so                         0x0000000107869421 mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<dmlc::ManualEvent>)::operator()(std::__1::shared_ptr<dmlc::ManualEvent>) const + 129
[bt] (9) 9   libmxnet.so                         0x0000000107869337 std::__1::__function::__func<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<dmlc::ManualEvent>), std::__1::allocator<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<dmlc::ManualEvent>)>, void (std::__1::shared_ptr<dmlc::ManualEvent>)>::operator()(std::__1::shared_ptr<dmlc::ManualEvent>&&) + 39

other activation types work fine.

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

@zachgk
Copy link
Contributor

zachgk commented Mar 19, 2019

@mxnet-label-bot add [Backend, Operator, Bug]

@marcoabreu marcoabreu added Backend Issues related to the backend of MXNet Bug Operator labels Mar 19, 2019
@Vikas-kum
Copy link
Contributor

@anirudh2290 can we close this as Training crash SSD with LeakyReLU(rrelu) #12894 is tracking the same issue.

@anirudhacharya
Copy link
Member Author

@Vikas89 I would prefer to keep this open, as it has a minimum reproducible example. And from the issue description of #12894 it would seem #12894 is a bigger issue as it says "Replacing LeakyReLU with activations at other positions also causes the training to crash".

This issue tracks a specific bug in a specific operator, with a example that will need to be included as a test case once the fix is made.

@mseth10
Copy link
Contributor

mseth10 commented May 15, 2019

@anirudhacharya , which MXNet version are you using? In case you are using master, can you specify the build flags?

@anirudhacharya
Copy link
Member Author

fyi, PR #14582 is trying to solve this issue.

I used the latest master, cannot recollect the compile flags i had used back then. But this error is reproducible even with the latest PyPi package.

@EmilPi
Copy link

EmilPi commented Aug 26, 2019

Hello, I installed latest 2019-08-23 build using sudo -H pip3 install mxnet-cu100==1.6.0b20190823 - issue still present there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet Bug Operator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants