Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Flaky Test] Segmentation fault in memory profiler tests #18564

Open
@mseth10

Description

Description

test_gpu_memory_profiler_gluon fails intermittently for different cu* flavors in nightly CD pipelines.

Occurrences

  1. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1257/pipeline
  2. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1245/pipeline

Error log

[2020-06-14T15:23:01.268Z] ________________________ test_gpu_memory_profiler_gluon ________________________
[2020-06-14T15:23:01.268Z] [gw1] linux -- Python 3.6.9 /opt/rh/rh-python36/root/usr/bin/python3
[2020-06-14T15:23:01.268Z] 
[2020-06-14T15:23:01.268Z]     @pytest.mark.skipif(mx.context.num_gpus() == 0, reason="GPU memory profiler records allocation on GPUs only")
[2020-06-14T15:23:01.268Z]     def test_gpu_memory_profiler_gluon():
[2020-06-14T15:23:01.268Z]         enable_profiler(profile_filename='test_profiler.json',
[2020-06-14T15:23:01.268Z] >                       run=True, continuous_dump=True)
[2020-06-14T15:23:01.268Z] 
[2020-06-14T15:23:01.268Z] tests/python/unittest/test_profiler.py:537: 
[2020-06-14T15:23:01.268Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2020-06-14T15:23:01.268Z] tests/python/unittest/test_profiler.py:40: in enable_profiler
[2020-06-14T15:23:01.268Z]     aggregate_stats=aggregate_stats)
[2020-06-14T15:23:01.268Z] python/mxnet/profiler.py:69: in set_config
[2020-06-14T15:23:01.268Z]     profiler_kvstore_handle))
[2020-06-14T15:23:01.268Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2020-06-14T15:23:01.268Z] 
[2020-06-14T15:23:01.268Z] ret = -1
[2020-06-14T15:23:01.268Z] 
[2020-06-14T15:23:01.268Z]     def check_call(ret):
[2020-06-14T15:23:01.268Z]         """Check the return value of C API call.
[2020-06-14T15:23:01.268Z]     
[2020-06-14T15:23:01.268Z]         This function will raise an exception when an error occurs.
[2020-06-14T15:23:01.268Z]         Wrap every API call with this function.
[2020-06-14T15:23:01.268Z]     
[2020-06-14T15:23:01.268Z]         Parameters
[2020-06-14T15:23:01.268Z]         ----------
[2020-06-14T15:23:01.268Z]         ret : int
[2020-06-14T15:23:01.268Z]             return value from API calls.
[2020-06-14T15:23:01.268Z]         """
[2020-06-14T15:23:01.268Z]         if ret != 0:
[2020-06-14T15:23:01.268Z] >           raise get_last_ffi_error()
[2020-06-14T15:23:01.268Z] E           mxnet.base.MXNetError: Traceback (most recent call last):
[2020-06-14T15:23:01.268Z] E             [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(MXSetProcessProfilerConfig+0x1bb) [0x7f937ce083eb]
[2020-06-14T15:23:01.268Z] E             [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::profiler::Profiler::SetConfig(int, std::string, bool, float, bool)+0x85) [0x7f93824c4ba5]
[2020-06-14T15:23:01.268Z] E             [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::profiler::Profiler::SetContinuousProfileDump(bool, float)+0x8b8) [0x7f93824c4428]
[2020-06-14T15:23:01.268Z] E             [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::ThreadGroup::Thread::joinable() const+0xbf) [0x7f93824c637f]
[2020-06-14T15:23:01.268Z] E             [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x6d) [0x7f937cc7df3d]
[2020-06-14T15:23:01.268Z] E             File "../include/dmlc/thread_group.h", line 226
[2020-06-14T15:23:01.268Z] E           MXNetError: Check failed: auto_remove_ == false (1 vs. 0) :
[2020-06-14T15:23:01.268Z] 
[2020-06-14T15:23:01.268Z] python/mxnet/base.py:246: MXNetError
[2020-06-14T15:23:01.268Z] ---------------------------- Captured stderr setup -----------------------------
[2020-06-14T15:23:01.268Z] DEBUG:root:np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce.
[2020-06-14T15:23:01.268Z] ------------------------------ Captured log setup ------------------------------
[2020-06-14T15:23:01.268Z] DEBUG    root:conftest.py:193 np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce.
[2020-06-14T15:23:01.268Z] --------------------------- Captured stderr teardown ---------------------------
[2020-06-14T15:23:01.268Z] INFO:root:np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce.
[2020-06-14T15:23:01.268Z] ---------------------------- Captured log teardown -----------------------------
[2020-06-14T15:23:01.268Z] INFO     root:conftest.py:210 np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce.

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions