Update `{'include': {'data_preprocessor': [...] }}`

It appears the the code still accepts `data_preprocessor` as a valid entry to `include: Dict[str, Any]` of the estimators.
It states the only valid entry is `'feature_type'` if passing `data_preprocessor: []`
```python
AutoSklearnClassifier({
    time_left_for_this_task=30,
    include={
        'data_preprocessor': ['feature_type'] 
    }
}
```
While this works fine, we also document that it can't be turned off [here](https://automl.github.io/auto-sklearn/master/manual.html#turning-off-preprocessing) and provide a broken example of how to turn it off [here](https://automl.github.io/auto-sklearn/master/examples/80_extending/example_extending_data_preprocessor.html#sphx-glr-examples-80-extending-example-extending-data-preprocessor-py). It's broken as see by the sprint statistics that show only a single DummyModel returned.

This is also confusing as `InputValidator` also seems to also do OrdinalEncoding which is a possible choice of the DataPreprocessor step in the pipieline. 

In the short term I can think of two possibilities:
* Allow a boolean switch, `include: {'data_preproccessing' : True/False}`, removing the pipeline step entirely
* Include a string like `include: {'data_preproccessing' : 'no_preprocessing'}`, having the pipeline step perform a _no-op_ on the data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update `{'include': {'data_preprocessor': [...] }}` #1266

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update {'include': {'data_preprocessor': [...] }} #1266

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Update `{'include': {'data_preprocessor': [...] }}` #1266