Skip to content

SweepableEstimator SearchSpace not being fully explored #7085

Open

Description

System Information (please complete the following information):

  • OS & Version: Windows 10]
  • ML.NET Version: 0.21.1
  • .NET Version: .Net 8.0

Describe the bug
The SearchSpace is not being fully explored for a SweepableEstimator.
I have a SweepableEstimator where the search space is for the 'k' for KMeans number of clusters.
The range is Min=3, Max=20 and Default = 10. (uniform int).
I am logging the selected k parameter when the SweepableEstimator is called.
The logs show that k hovers around the default value (i.e. 8,9,10,11).
The full space is not explored.

A clear and concise description of what the bug is.
The script that showcases this problem is here

https://github.com/fwaris/MLNetGEOpt/blob/master/MLNetGEOpt/scripts/custering.fsx

Expected behavior
The search space should be explored more fully

Screenshots, Code, Sample Projects
Project: https://github.com/fwaris/MLNetGEOpt

Additional context

Background

The referenced project is a layer of auto ML above the AutoML (of ML.Net). This higher layer is called 'MLNetGEOpt'.

AutoML finds optimal parameters given a SweepablePipeline.

MLNetGEOpt proposes new SweepablePipelines for AutoML to optimize.

It uses a method called "Grammatical Evolution" (GE). The pipelines are constructed according to a given 'grammar'. Each pipeline is a valid 'sentence' constructed from the grammar.

The grammar ensures that the pipelines are reasonable. This greatly reduces the search space - as compared to randomly constructed pipelines - say via a Genetic Algorithm.

Note: I solved for optimal number of clusters by building a grammar that allows for one-of-many SweepableEstimators each tied a particular k.

Here is an example of the grammar (prefix 'se' stands for SweepableEstimator; 'Alt'=select 1 from available options; 'Opt'=optional term):

let g = 
    [
        Estimator seBase
        Opt(Estimator (E.Def.seFtrSelCount 3))        
        Alt [
            Alt ([(1,10); (11,20); (21,30); (31,100)] |> List.map(E.Def.seNorm>>Estimator))
            Estimator E.Def.seNormLpNorm
            Estimator E.Def.seNormLogMeanVar
            Estimator E.Def.seNormMeanVar
            Alt([0.1f .. 0.5f .. 4.0f] |> List.pairwise |> List.map(fun (a,b) -> a, b - 0.001f)  |> List.map(E.Def.seGlobalContrast>>Estimator))
            Estimator E.Def.seNormMinMax
            Estimator E.Def.seNormRobustScaling
        ]
        Alt [for i in 3 .. 20 -> Estimator (seCluster i)]  // this works 
        //Estimator seClusterWithSS                        // this does not work
    ]

For reference, a specific grammar can be constructed from this simple 'meta-grammar':

type Term = 
    | Opt of Term 
    | Pipeline of (unit -> SweepablePipeline)
    | Estimator of (unit -> SweepableEstimator)
    | Alt of Term list
    | Union of Term list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    AutoML.NETAutomating various steps of the machine learning processenhancementNew feature or requestuntriagedNew issue has not been triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions