Closed
Description
openedon May 3, 2019
Issue
version: 0.11
- What did you do?
- I trained a LightGBM multi-class classifier with
UseCat
totrue
. - I added a
SelectFeaturesBasedOnCount
on the finalFeatures
.
If it matters: I also use early stopping (does this prune some trees?).
- What happened?
I got an exception after training was done (successfully) and ML.NET tries to construct the InternalRegressionTree
:
System.InvalidOperationException: 'Categorical split features is zero length'
Stack
> Microsoft.ML.Core.dll!Microsoft.ML.Contracts.Check(bool f, string msg) Line 491 C#
Microsoft.ML.FastTree.dll!Microsoft.ML.Trainers.FastTree.InternalRegressionTree.CheckValid(System.Action<bool, string> checker) Line 471 C#
Microsoft.ML.FastTree.dll!Microsoft.ML.Trainers.FastTree.InternalRegressionTree.InternalRegressionTree(int[] splitFeatures, double[] splitGain, double[] gainPValue, float[] rawThresholds, float[] defaultValueForMissing, int[] lteChild, int[] gtChild, double[] leafValues, int[][] categoricalSplitFeatures, bool[] categoricalSplit) Line 224 C#
Microsoft.ML.FastTree.dll!Microsoft.ML.Trainers.FastTree.InternalRegressionTree.Create(int numLeaves, int[] splitFeatures, double[] splitGain, float[] rawThresholds, float[] defaultValueForMissing, int[] lteChild, int[] gtChild, double[] leafValues, int[][] categoricalSplitFeatures, bool[] categoricalSplit) Line 188 C#
Microsoft.ML.LightGBM.dll!Microsoft.ML.LightGBM.Booster.GetModel(int[] categoricalFeatureBoudaries) Line 257 C#
Microsoft.ML.LightGBM.dll!Microsoft.ML.LightGBM.LightGbmTrainerBase<Microsoft.ML.Data.VBuffer<float>, Microsoft.ML.Data.MulticlassPredictionTransformer<Microsoft.ML.Trainers.OvaModelParameters>, Microsoft.ML.Trainers.OvaModelParameters>.TrainCore(Microsoft.ML.IChannel ch, Microsoft.ML.IProgressChannel pch, Microsoft.ML.LightGBM.Dataset dtrain, Microsoft.ML.LightGBM.LightGbmTrainerBase<Microsoft.ML.Data.VBuffer<float>, Microsoft.ML.Data.MulticlassPredictionTransformer<Microsoft.ML.Trainers.OvaModelParameters>, Microsoft.ML.Trainers.OvaModelParameters>.CategoricalMetaData catMetaData, Microsoft.ML.LightGBM.Dataset dvalid) Line 375 C#
Microsoft.ML.LightGBM.dll!Microsoft.ML.LightGBM.LightGbmTrainerBase<Microsoft.ML.Data.VBuffer<float>, Microsoft.ML.Data.MulticlassPredictionTransformer<Microsoft.ML.Trainers.OvaModelParameters>, Microsoft.ML.Trainers.OvaModelParameters>.TrainModelCore(Microsoft.ML.TrainContext context) Line 117 C#
Microsoft.ML.Data.dll!Microsoft.ML.Trainers.TrainerEstimatorBase<Microsoft.ML.Data.MulticlassPredictionTransformer<Microsoft.ML.Trainers.OvaModelParameters>, Microsoft.ML.Trainers.OvaModelParameters>.TrainTransformer(Microsoft.Data.DataView.IDataView trainSet, Microsoft.Data.DataView.IDataView validationSet, Microsoft.ML.IPredictor initPredictor) Line 148 C#
MlnEval.exe!ConsoleApp1.MlNetSpecific.MlNetLightGbmMultiClassTrainer.TrainAndEval(ConsoleApp1.Dev.AppState app) Line 107 C#
MlnEval.exe!ConsoleApp1.Program.Main(string[] args) Line 116 C#
Unfortunately I failed to come up with a minimal reproducible example. Seems this requires a certain data setup.
Partial analysis
It looks like here:
machinelearning/src/Microsoft.ML.LightGbm/WrappedLightGbmBooster.cs
Lines 236 to 251 in c5aab77
cats
array is actually 0 length but categoricalSplit[node]
is still set to true.
which then later on will throw here:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment