Skip to content

disable test parallelization for ML.Test assembly to avoid crash #4896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 27, 2020

Conversation

frank-dong-ms-zz
Copy link
Contributor

@frank-dong-ms-zz frank-dong-ms-zz commented Feb 26, 2020

In ML.Tests assembly, when some tests run in parallel, there are chance the host process will crash due to below exception:
The thread tried to read from or write to a virtual address for which it does not have the appropriate access.
Unhandled exception at 0x00007FFA70E7B049 (ntdll.dll) in dotnet.exe.12884.dmp: 0xC0000374: A heap has been corrupted (parameters: 0x00007FFA70EE27F0).

Looked into this error in detail, they come from LightGBM/OnnxRuntime dll we are referencing, seems like null pointer error during object finalization.
image

These crash issue can be mitigated if we disable test parallelization. At the meantime, I'm contacting LightGBM and OnnxRuntime team to take a deeper look, maybe they should do null pointer check at their end.

Below are combination of tests run in parallel likely to cause crash, there maybe more:
LightGBMBinaryEstimatorUnbalanced and BinaryClassificationTrainersOnnxConversionTest
LightGBMRegressorEstimator and BinaryClassificationTrainersOnnxConversionTest
LightGBMBinaryEstimatorUnbalanced and TestSGDBinary
LightGBMBinaryEstimatorUnbalanced and CommandLineOnnxConversionTest
LightGBMBinaryEstimatorCorrectSigmoid and MulticlassConfusionMatrixSlotNames
IrisVectorLightGbmWithLoadColumnName and PlattCalibratorOnnxConversionTest

@frank-dong-ms-zz frank-dong-ms-zz marked this pull request as ready for review February 27, 2020 01:28
@frank-dong-ms-zz frank-dong-ms-zz requested a review from a team as a code owner February 27, 2020 01:28

// TODO: disable test parallelization for this assembly as running test in parallel sometimes cause test host process to crash
[assembly: CollectionBehavior(DisableTestParallelization = true)]

Copy link
Contributor

@harishsk harishsk Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add an additional tag after the TODO to help us search for tags associated with this work item. Say something like // TODO: TEST_STABILITY #Resolved

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@frank-dong-ms-zz frank-dong-ms-zz merged commit f0a8a76 into dotnet:master Feb 27, 2020
@frank-dong-ms-zz frank-dong-ms-zz deleted the ML.Tests-crash branch April 7, 2020 04:29
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants