Skip to content

LightGBM option MaximumCategoricalSplitPointCount has wrong doc #3687

Closed

Description

See:

/// <summary>
/// When the number of categories of one feature is smaller than or equal to <see cref="MaximumCategoricalSplitPointCount"/>,
/// one-vs-other split algorithm will be used.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Max number of categorical thresholds.", ShortName = "maxcat")]
[TlcModule.Range(Inf = 0, Max = int.MaxValue)]
[TlcModule.SweepableDiscreteParam("MaxCatThreshold", new object[] { 8, 16, 32, 64 })]
public int MaximumCategoricalSplitPointCount = 32;

and

private protected static Dictionary<string, string> NameMapping = new Dictionary<string, string>()
{
{nameof(MinimumExampleCountPerLeaf), "min_data_per_leaf"},
{nameof(NumberOfLeaves), "num_leaves"},
{nameof(MaximumBinCountPerFeature), "max_bin" },
{nameof(MinimumExampleCountPerGroup), "min_data_per_group" },
{nameof(MaximumCategoricalSplitPointCount), "max_cat_threshold" },
{nameof(CategoricalSmoothing), "cat_smooth" },
{nameof(L2CategoricalRegularization), "cat_l2" },
{nameof(HandleMissingValue), "use_missing" }
};

and
https://lightgbm.readthedocs.io/en/latest/Parameters.html#max_cat_threshold :

max_cat_threshold, default = 32, type = int, constraints: max_cat_threshold > 0
- used for the categorical features
- limit the max threshold points in categorical features

It got mixed up with:

https://lightgbm.readthedocs.io/en/latest/Parameters.html#max_cat_to_onehot

when number of categories of one feature smaller than or equal to max_cat_to_onehot, one-vs-other split algorithm will be used
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

documentationRelated to documentation of ML.NET

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions