Skip to content

LightGbm ranking with Options does not set GroupId column even when specified in Options #2652

Closed
@najeeb-kazmi

Description

@najeeb-kazmi

During the course of Fixing #2530 in PR #2650, I discovered that when creating a LightGbmRankingTrainer with MLContext.Ranking.Trainers.LightGbm(Options options), the GroupId column is never set even though it is specified in the Options.

Digging a bit deeper, I found that in LightGbmTrainerBase.cs we have

        private protected LightGbmTrainerBase(IHostEnvironment env, string name, Options options, SchemaShape.Column label)
           : base(Contracts.CheckRef(env, nameof(env)).Register(name), TrainerUtils.MakeR4VecFeature(options.FeatureColumn), label, TrainerUtils.MakeR4ScalarWeightColumn(options.WeightColumn))

This does not use options.GroupIdColumn to create a SchemaShape.Column for GroupId to pass to the base constructor TrainerEstimatorBaseWithGroupId. Neither is there a method in TrainerUtils to create such a SchemaShape.Column object from options.GroupIdColumn for Key types such as GroupId.

Repro:
Run the sample in Microsoft.ML.Samples.Dynamic.LightGbmRankingWithOptions.Example().

Unhandled Exception: System.ArgumentOutOfRangeException: Need a group column.
Parameter name: data
at Microsoft.ML.Contracts.CheckParam(IExceptionContext ctx, Boolean f, String paramName, String msg) in C:\najeeb-kazmi\machinelearning\src\Microsoft.ML.Core\Utilities\Contracts.cs:line 543
at Microsoft.ML.LightGBM.LightGbmRankingTrainer.CheckDataValid(IChannel ch, RoleMappedData data) in C:\najeeb-kazmi\machinelearning\src\Microsoft.ML.LightGBM\LightGbmRankingTrainer.cs:line 127
at Microsoft.ML.LightGBM.LightGbmTrainerBase3.LoadTrainingData(IChannel ch, RoleMappedData trainData, CategoricalMetaData& catMetaData) in C:\najeeb-kazmi\machinelearning\src\Microsoft.ML.LightGBM\LightGbmTrainerBase.cs:line 313 at Microsoft.ML.LightGBM.LightGbmTrainerBase3.TrainModelCore(TrainContext context) in C:\najeeb-kazmi\machinelearning\src\Microsoft.ML.LightGBM\LightGbmTrainerBase.cs:line 113
at Microsoft.ML.Training.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) in C:\najeeb-kazmi\machinelearning\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 148 at Microsoft.ML.Training.TrainerEstimatorBase2.Fit(IDataView input) in C:\najeeb-kazmi\machinelearning\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 75
at Microsoft.ML.Samples.Dynamic.LightGbmRankingWithOptions.Example() in C:\najeeb-kazmi\machinelearning\docs\samples\Microsoft.ML.Samples\Dynamic\Trainers\Ranking\LightGBMRankingWithOptions.cs:line 31
at Microsoft.ML.Samples.Program.Main(String[] args) in C:\najeeb-kazmi\machinelearning\docs\samples\Microsoft.ML.Samples\Program.cs:line 9

cc: @abgoswam @zeahmed @TomFinley @eerhardt

Metadata

Metadata

Assignees

Labels

APIIssues pertaining the friendly APIbugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions