Added baselines for the TestDatasets.breastCancerPipeMissing dataset, and their unit tests#4785
Added baselines for the TestDatasets.breastCancerPipeMissing dataset, and their unit tests#4785mstfbl wants to merge 3 commits intodotnet:masterfrom
Conversation
Added baselines for TestDatasets.breastCancerPipeMissing for the LightGBMClassificationTest, GossLightGBMTest, and DartLightGBMTest.
|
Hi, @mstfbl . Can you please explain here why these baselines are needed, if the ones with missing values already exist? Specially since, as you've stated, these baselines won't actually test the changes introduced in your other PR. |
|
Hi @antoniovs1029 , I wasn't as clear as I could have been in the original post. There I meant that the change made in PR #4695 with In addition, they are a second set of baselines that are tested in the already-existing set of LightGbm tests, and I think having extra cases to test isn't a bad thing at all. |
|
This is not right way to go. Add unit test in your original PR #4695. Take any already generated model file and add it to your PR. Also add just one (for one dataset) established baseline. Load the model in Unit test and run with the new LightGbm code, compare the baselines. In reply to: 582580955 [](ancestors = 582580955) |
Related to Issue #4681 and PR #4695.
This PR adds the baseline figures for the
TestDatasets.breastCancerPipeMissingdataset (which is the same dataset asTestDatasets.breastCancerPipebut without missing values due to theNAHandletransformer), and adds these baseline figures to the required tests inTestPredictors.cs.The tests that these baselines satisfy are:
LightGBMClassificationTest,GossLightGBMTest,DartLightGBMTest,FastTreeBinaryClassificationTest,FastTreeHighMinDocsTestMulticlassNaiveBayesThese tests are also originally the only unit tests that utilize
TestDatasets.breastCancerPipe.Since the
TestDatasets.breastCancerPipeMissingdataset does not contain missing?values, the change in PR #4695 of allowingCursOpt.AllFeatureswill not impact the test cases mentioned above.**Edit: ** In the last sentence above, I mean that since there are no missing values in
TestDatasets.breastCancerPipeMissing, these datasets are better suited to test these LightGbm tests asCursOpt.AllFeaturescombined with theHandleMissingValuesflag inLightGbmTrainerBase.csintroduce multiple modifiers to the case of handling missing values, which thisTestDatasets.breastCancerPipeMissingdataset is not impacted by.