This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Open
Description
Description
test_gpu_memory_profiler_gluon
fails intermittently for different cu* flavors in nightly CD pipelines.
Occurrences
- http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1257/pipeline
- http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1245/pipeline
Error log
[2020-06-14T15:23:01.268Z] ________________________ test_gpu_memory_profiler_gluon ________________________
[2020-06-14T15:23:01.268Z] [gw1] linux -- Python 3.6.9 /opt/rh/rh-python36/root/usr/bin/python3
[2020-06-14T15:23:01.268Z]
[2020-06-14T15:23:01.268Z] @pytest.mark.skipif(mx.context.num_gpus() == 0, reason="GPU memory profiler records allocation on GPUs only")
[2020-06-14T15:23:01.268Z] def test_gpu_memory_profiler_gluon():
[2020-06-14T15:23:01.268Z] enable_profiler(profile_filename='test_profiler.json',
[2020-06-14T15:23:01.268Z] > run=True, continuous_dump=True)
[2020-06-14T15:23:01.268Z]
[2020-06-14T15:23:01.268Z] tests/python/unittest/test_profiler.py:537:
[2020-06-14T15:23:01.268Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2020-06-14T15:23:01.268Z] tests/python/unittest/test_profiler.py:40: in enable_profiler
[2020-06-14T15:23:01.268Z] aggregate_stats=aggregate_stats)
[2020-06-14T15:23:01.268Z] python/mxnet/profiler.py:69: in set_config
[2020-06-14T15:23:01.268Z] profiler_kvstore_handle))
[2020-06-14T15:23:01.268Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2020-06-14T15:23:01.268Z]
[2020-06-14T15:23:01.268Z] ret = -1
[2020-06-14T15:23:01.268Z]
[2020-06-14T15:23:01.268Z] def check_call(ret):
[2020-06-14T15:23:01.268Z] """Check the return value of C API call.
[2020-06-14T15:23:01.268Z]
[2020-06-14T15:23:01.268Z] This function will raise an exception when an error occurs.
[2020-06-14T15:23:01.268Z] Wrap every API call with this function.
[2020-06-14T15:23:01.268Z]
[2020-06-14T15:23:01.268Z] Parameters
[2020-06-14T15:23:01.268Z] ----------
[2020-06-14T15:23:01.268Z] ret : int
[2020-06-14T15:23:01.268Z] return value from API calls.
[2020-06-14T15:23:01.268Z] """
[2020-06-14T15:23:01.268Z] if ret != 0:
[2020-06-14T15:23:01.268Z] > raise get_last_ffi_error()
[2020-06-14T15:23:01.268Z] E mxnet.base.MXNetError: Traceback (most recent call last):
[2020-06-14T15:23:01.268Z] E [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(MXSetProcessProfilerConfig+0x1bb) [0x7f937ce083eb]
[2020-06-14T15:23:01.268Z] E [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::profiler::Profiler::SetConfig(int, std::string, bool, float, bool)+0x85) [0x7f93824c4ba5]
[2020-06-14T15:23:01.268Z] E [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::profiler::Profiler::SetContinuousProfileDump(bool, float)+0x8b8) [0x7f93824c4428]
[2020-06-14T15:23:01.268Z] E [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::ThreadGroup::Thread::joinable() const+0xbf) [0x7f93824c637f]
[2020-06-14T15:23:01.268Z] E [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x6d) [0x7f937cc7df3d]
[2020-06-14T15:23:01.268Z] E File "../include/dmlc/thread_group.h", line 226
[2020-06-14T15:23:01.268Z] E MXNetError: Check failed: auto_remove_ == false (1 vs. 0) :
[2020-06-14T15:23:01.268Z]
[2020-06-14T15:23:01.268Z] python/mxnet/base.py:246: MXNetError
[2020-06-14T15:23:01.268Z] ---------------------------- Captured stderr setup -----------------------------
[2020-06-14T15:23:01.268Z] DEBUG:root:np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce.
[2020-06-14T15:23:01.268Z] ------------------------------ Captured log setup ------------------------------
[2020-06-14T15:23:01.268Z] DEBUG root:conftest.py:193 np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce.
[2020-06-14T15:23:01.268Z] --------------------------- Captured stderr teardown ---------------------------
[2020-06-14T15:23:01.268Z] INFO:root:np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce.
[2020-06-14T15:23:01.268Z] ---------------------------- Captured log teardown -----------------------------
[2020-06-14T15:23:01.268Z] INFO root:conftest.py:210 np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce.