Text preprocessing V2 TODOs

This is a list of follow-up tasks to #1300.

# General implementation
* [x] Improve text example to include more meaningful dataset
* [x] Improve text example to contain links to further material that describes how we handle text data, for example [this](https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html)
* [x] Rename hyperparameters following [this comment](https://github.com/automl/auto-sklearn/pull/1300#discussion_r787510494)
* [x] Potentially move the text feature reduction to a different module
* [x] discuss handling of pandas dtype object -> can we default it to string or categorical?
* [x] Add a parameter to allow for text processing (default to True)
* [x] Discuss text feature support in the [manual](https://automl.github.io/auto-sklearn/master/manual.html)
* [x] Improve the way feature types are passed to the meta-feature computation (search for the following todo: `Todo make this more cohesive to the overall structure (quick bug fix)`)
* [x] Fix https://github.com/automl/auto-sklearn/issues/741

# Hyperparameter space
* [ ] Benchmark whether TF/IDF should be applied on a per-sample or per-feature level (see https://github.com/automl/auto-sklearn/pull/1300#discussion_r787863150)
* [ ] Improve text feature reduction upper and lower bound


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Text preprocessing V2 TODOs #1373

General implementation

Hyperparameter space

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Text preprocessing V2 TODOs #1373

Description

General implementation

Hyperparameter space

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions