Skip to content

LightGBM Crash issue #4918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 5, 2020
Merged

Conversation

frank-dong-ms-zz
Copy link
Contributor

@frank-dong-ms-zz frank-dong-ms-zz commented Mar 4, 2020

microsoft/LightGBM#2820

LightGBM has dependency on OpenMP multi-threading library. In our tests we are setting number of threads to be 1 for LightGBM and LightGBM also sets OpenMP to use only 1 thread. While OpenMP is also used by other native libraries and they also tend to set number of threads for OpenMP to use (by default this is number of cores, in our case it is 2 as we are using https://docs.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series#dsv2-series). This setting(number of threads) is global in process and cause LightGBM referencing OpenMP from single threads to multi-threads. LightGBM is using number of threads for indexing and thus cause index out of range in native code and crash the process.

By this fix, we are not force single threading when run LightGBM related tests thus default behavior is applied and all the libraries use same number of threads for OpenMP.

Idealy LightGBM better not to rely number of threads to do indexing as this setting is global and likely be override by other library.

@frank-dong-ms-zz frank-dong-ms-zz changed the title Crash test LightGBM Crash issue Mar 5, 2020
@frank-dong-ms-zz frank-dong-ms-zz marked this pull request as ready for review March 5, 2020 06:17
@frank-dong-ms-zz frank-dong-ms-zz requested a review from a team as a code owner March 5, 2020 06:17
@harishsk
Copy link
Contributor

harishsk commented Mar 5, 2020

    public void LightGBMBinaryEstimator()

Were these the crashing tests?
Weren't they disabled?


Refers to: test/Microsoft.ML.Tests/TrainerEstimators/TreeEstimators.cs:51 in cf16c7d. [](commit_id = cf16c7d, deletion_comment = False)

@@ -76,7 +75,6 @@ public void LightGBMBinaryEstimatorUnbalanced()
var trainer = ML.BinaryClassification.Trainers.LightGbm(new LightGbmBinaryTrainer.Options
{
NumberOfLeaves = 10,
NumberOfThreads = 1,
MinimumExampleCountPerLeaf = 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a detailed note somewhere in code visibly about why this is done?
Can we also add a cross reference to that comment wherever we have removed the NumberOfThreads = 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can add some comments with // Attention tags and put explain on comments


In reply to: 388442580 [](ancestors = 388442580)

@frank-dong-ms-zz
Copy link
Contributor Author

    public void LightGBMBinaryEstimator()

Yes, these are the crashing tests, they not disabled because I has check in a mitigation before (disable test parallelization). This time I have remove the mitigation that remove the AssemblyInfo.cs from ML.Tests assembly


In reply to: 595344964 [](ancestors = 595344964)


Refers to: test/Microsoft.ML.Tests/TrainerEstimators/TreeEstimators.cs:51 in cf16c7d. [](commit_id = cf16c7d, deletion_comment = False)

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@frank-dong-ms-zz frank-dong-ms-zz merged commit ae1b709 into dotnet:master Mar 5, 2020
@frank-dong-ms-zz frank-dong-ms-zz deleted the crash-test branch April 7, 2020 04:29
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants