instantiate sklearn model or get_params from Configuration object

I was looking around in the code for a way to instantiate an sklearn model based on a `Configuration` object. My use-case is that I try to implement a generalized way of getting standard metadata about a completed autosklearn run. Eg, I call `autosklearn_model.get_models_with_weights()` and that ends up containing some `Configuration` objects. But these may for example just describe an sklearn model, although I understand it is possible to extend and register other model types as well. In either case I guess I would like access to an instance of the model with matching configuration, so that I can try calling `get_params()` on it to see if it supports that interface. Maybe this is simply accessible somewhere else, but my idea was to re-instantiate a dummy model based on the `Configuration`, and then do this calling of `get_params()`. Ideal would be that I could dynamically instantiate whatever model is described by the `__choice__` (even if it's not sklearn), according to however autosklearn does it internally.

I was poking around in the code and found eg https://github.com/automl/auto-sklearn/blob/bb8396b3dbe2e61cfdf65b5fbd9793b1d2d3dffc/autosklearn/pipeline/components/classification/random_forest.py#L46
I was expecting to find though some place where this import is dynamically selected based on `choice`, but maybe this is just a wrapper class and is itself chosen dynamically?

Then bits like this:
https://github.com/automl/auto-sklearn/blob/bb8396b3dbe2e61cfdf65b5fbd9793b1d2d3dffc/test/test_pipeline/components/classification/test_base.py#L279-L283

Could you point me in the right direction? Or advise if I am missing some fundamental point about how this should work.

To make sure I am clear above I'll also include an example. I have a `config` object like this:
```
Configuration:
  balancing:strategy, Value: 'none'
  classifier:__choice__, Value: 'random_forest'
  classifier:random_forest:bootstrap, Value: 'True'
  classifier:random_forest:criterion, Value: 'gini'
  classifier:random_forest:max_depth, Constant: 'None'
  classifier:random_forest:max_features, Value: 0.5
  classifier:random_forest:max_leaf_nodes, Constant: 'None'
  classifier:random_forest:min_impurity_decrease, Constant: 0.0
  classifier:random_forest:min_samples_leaf, Value: 1
  classifier:random_forest:min_samples_split, Value: 2
  classifier:random_forest:min_weight_fraction_leaf, Constant: 0.0
  data_preprocessing:categorical_transformer:categorical_encoding:__choice__, Value: 'one_hot_encoding'
  data_preprocessing:categorical_transformer:category_coalescence:__choice__, Value: 'minority_coalescer'
  data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.01
  data_preprocessing:numerical_transformer:imputation:strategy, Value: 'mean'
  data_preprocessing:numerical_transformer:rescaling:__choice__, Value: 'standardize'
  feature_preprocessor:__choice__, Value: 'no_preprocessing'
```

And I want something that produces the equivalent of this (but without hard-coding the model choice and removing parameters that sklearn doesn't like etc):
```python
            hps = {hp_name.rsplit(':')[-1]: config[hp_name] for hp_name in config if config[hp_name] is not None}
            from sklearn.ensemble import RandomForestClassifier
            hps = {k:v for k,v in hps.items() if k not in ['strategy', '__choice__', 'minimum_fraction']}
            return to_mls_sklearn(RandomForestClassifier(**hps))
```

	classifier = self.module
	config = configuration_space.sample_configuration()
	classifier = classifier(random_state=np.random.RandomState(1),
	**{hp_name: config[hp_name] for hp_name in
	config if config[hp_name] is not None})

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

instantiate sklearn model or get_params from Configuration object #886

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

instantiate sklearn model or get_params from Configuration object #886

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions