Open
Description
I would like to extend auto-sklearn to handle datasets with both numerical and textual features. In particular, I want to implement a custom preprocessor that can take a textual feature and apply a TFIDF transformation.
This has raised a few concerns/questions in my head:
- Since AutoSklearn does not accept features of type
object
, I will have cast my text feature to typecategory
, but I do not want the standard categorical preprocessors (e.g., OHE) to be executed on this text feature on accident. Is there a way to achieve this? - How can I be sure that my custom preprocessor is only executed on my textual feature, and not the other (numeric) features?
If the above is simply not possible with the current Auto Sklearn architecture, would you be interested in a pull request that would extend auto sklearn to handle textual features?