Description
openedon Mar 20, 2024
System Information (please complete the following information):
- OS & Version: Windows 10]
- ML.NET Version: 0.21.1
- .NET Version: .Net 8.0
Describe the bug
The SearchSpace is not being fully explored for a SweepableEstimator.
I have a SweepableEstimator where the search space is for the 'k' for KMeans number of clusters.
The range is Min=3, Max=20 and Default = 10. (uniform int).
I am logging the selected k parameter when the SweepableEstimator is called.
The logs show that k hovers around the default value (i.e. 8,9,10,11).
The full space is not explored.
A clear and concise description of what the bug is.
The script that showcases this problem is here
https://github.com/fwaris/MLNetGEOpt/blob/master/MLNetGEOpt/scripts/custering.fsx
Expected behavior
The search space should be explored more fully
Screenshots, Code, Sample Projects
Project: https://github.com/fwaris/MLNetGEOpt
Additional context
Background
The referenced project is a layer of auto ML above the AutoML (of ML.Net). This higher layer is called 'MLNetGEOpt'.
AutoML finds optimal parameters given a SweepablePipeline.
MLNetGEOpt proposes new SweepablePipelines for AutoML to optimize.
It uses a method called "Grammatical Evolution" (GE). The pipelines are constructed according to a given 'grammar'. Each pipeline is a valid 'sentence' constructed from the grammar.
The grammar ensures that the pipelines are reasonable. This greatly reduces the search space - as compared to randomly constructed pipelines - say via a Genetic Algorithm.
Note: I solved for optimal number of clusters by building a grammar that allows for one-of-many SweepableEstimators each tied a particular k.
Here is an example of the grammar (prefix 'se' stands for SweepableEstimator; 'Alt'=select 1 from available options; 'Opt'=optional term):
let g =
[
Estimator seBase
Opt(Estimator (E.Def.seFtrSelCount 3))
Alt [
Alt ([(1,10); (11,20); (21,30); (31,100)] |> List.map(E.Def.seNorm>>Estimator))
Estimator E.Def.seNormLpNorm
Estimator E.Def.seNormLogMeanVar
Estimator E.Def.seNormMeanVar
Alt([0.1f .. 0.5f .. 4.0f] |> List.pairwise |> List.map(fun (a,b) -> a, b - 0.001f) |> List.map(E.Def.seGlobalContrast>>Estimator))
Estimator E.Def.seNormMinMax
Estimator E.Def.seNormRobustScaling
]
Alt [for i in 3 .. 20 -> Estimator (seCluster i)] // this works
//Estimator seClusterWithSS // this does not work
]
For reference, a specific grammar can be constructed from this simple 'meta-grammar':
type Term =
| Opt of Term
| Pipeline of (unit -> SweepablePipeline)
| Estimator of (unit -> SweepableEstimator)
| Alt of Term list
| Union of Term list