Skip to content

Update {'include': {'data_preprocessor': [...] }} #1266

Open
@eddiebergman

Description

@eddiebergman

It appears the the code still accepts data_preprocessor as a valid entry to include: Dict[str, Any] of the estimators.
It states the only valid entry is 'feature_type' if passing data_preprocessor: []

AutoSklearnClassifier({
    time_left_for_this_task=30,
    include={
        'data_preprocessor': ['feature_type'] 
    }
}

While this works fine, we also document that it can't be turned off here and provide a broken example of how to turn it off here. It's broken as see by the sprint statistics that show only a single DummyModel returned.

This is also confusing as InputValidator also seems to also do OrdinalEncoding which is a possible choice of the DataPreprocessor step in the pipieline.

In the short term I can think of two possibilities:

  • Allow a boolean switch, include: {'data_preproccessing' : True/False}, removing the pipeline step entirely
  • Include a string like include: {'data_preproccessing' : 'no_preprocessing'}, having the pipeline step perform a no-op on the data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions