Open
Description
This is a list of follow-up tasks to #1300.
General implementation
- Improve text example to include more meaningful dataset
- Improve text example to contain links to further material that describes how we handle text data, for example this
- Rename hyperparameters following this comment
- Potentially move the text feature reduction to a different module
- discuss handling of pandas dtype object -> can we default it to string or categorical?
- Add a parameter to allow for text processing (default to True)
- Discuss text feature support in the manual
- Improve the way feature types are passed to the meta-feature computation (search for the following todo:
Todo make this more cohesive to the overall structure (quick bug fix)
) - Fix Unused hyperparameters remain active when datasets are purely categorical or purely numerical #741
Hyperparameter space
- Benchmark whether TF/IDF should be applied on a per-sample or per-feature level (see Text Processing #1300 (comment))
- Improve text feature reduction upper and lower bound