Skip to content

instantiate sklearn model or get_params from Configuration object #886

Closed
@chrisbarber

Description

@chrisbarber

I was looking around in the code for a way to instantiate an sklearn model based on a Configuration object. My use-case is that I try to implement a generalized way of getting standard metadata about a completed autosklearn run. Eg, I call autosklearn_model.get_models_with_weights() and that ends up containing some Configuration objects. But these may for example just describe an sklearn model, although I understand it is possible to extend and register other model types as well. In either case I guess I would like access to an instance of the model with matching configuration, so that I can try calling get_params() on it to see if it supports that interface. Maybe this is simply accessible somewhere else, but my idea was to re-instantiate a dummy model based on the Configuration, and then do this calling of get_params(). Ideal would be that I could dynamically instantiate whatever model is described by the __choice__ (even if it's not sklearn), according to however autosklearn does it internally.

I was poking around in the code and found eg

from sklearn.ensemble import RandomForestClassifier

I was expecting to find though some place where this import is dynamically selected based on choice, but maybe this is just a wrapper class and is itself chosen dynamically?

Then bits like this:

classifier = self.module
config = configuration_space.sample_configuration()
classifier = classifier(random_state=np.random.RandomState(1),
**{hp_name: config[hp_name] for hp_name in
config if config[hp_name] is not None})

Could you point me in the right direction? Or advise if I am missing some fundamental point about how this should work.

To make sure I am clear above I'll also include an example. I have a config object like this:

Configuration:
  balancing:strategy, Value: 'none'
  classifier:__choice__, Value: 'random_forest'
  classifier:random_forest:bootstrap, Value: 'True'
  classifier:random_forest:criterion, Value: 'gini'
  classifier:random_forest:max_depth, Constant: 'None'
  classifier:random_forest:max_features, Value: 0.5
  classifier:random_forest:max_leaf_nodes, Constant: 'None'
  classifier:random_forest:min_impurity_decrease, Constant: 0.0
  classifier:random_forest:min_samples_leaf, Value: 1
  classifier:random_forest:min_samples_split, Value: 2
  classifier:random_forest:min_weight_fraction_leaf, Constant: 0.0
  data_preprocessing:categorical_transformer:categorical_encoding:__choice__, Value: 'one_hot_encoding'
  data_preprocessing:categorical_transformer:category_coalescence:__choice__, Value: 'minority_coalescer'
  data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.01
  data_preprocessing:numerical_transformer:imputation:strategy, Value: 'mean'
  data_preprocessing:numerical_transformer:rescaling:__choice__, Value: 'standardize'
  feature_preprocessor:__choice__, Value: 'no_preprocessing'

And I want something that produces the equivalent of this (but without hard-coding the model choice and removing parameters that sklearn doesn't like etc):

            hps = {hp_name.rsplit(':')[-1]: config[hp_name] for hp_name in config if config[hp_name] is not None}
            from sklearn.ensemble import RandomForestClassifier
            hps = {k:v for k,v in hps.items() if k not in ['strategy', '__choice__', 'minimum_fraction']}
            return to_mls_sklearn(RandomForestClassifier(**hps))

Metadata

Metadata

Labels

enhancementA new improvement or feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions