Skip to content

Text preprocessing V2 TODOs #1373

Open
Open
@mfeurer

Description

@mfeurer

This is a list of follow-up tasks to #1300.

General implementation

  • Improve text example to include more meaningful dataset
  • Improve text example to contain links to further material that describes how we handle text data, for example this
  • Rename hyperparameters following this comment
  • Potentially move the text feature reduction to a different module
  • discuss handling of pandas dtype object -> can we default it to string or categorical?
  • Add a parameter to allow for text processing (default to True)
  • Discuss text feature support in the manual
  • Improve the way feature types are passed to the meta-feature computation (search for the following todo: Todo make this more cohesive to the overall structure (quick bug fix))
  • Fix Unused hyperparameters remain active when datasets are purely categorical or purely numerical #741

Hyperparameter space

  • Benchmark whether TF/IDF should be applied on a per-sample or per-feature level (see Text Processing #1300 (comment))
  • Improve text feature reduction upper and lower bound

Metadata

Metadata

Labels

enhancementA new improvement or feature

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions