Feature Engineering for Categorical Data
In this research-based project, we propose a statistical-based solution for automatically detecting the data type of features in any data-set whether it’s nominal, ordinal or numerical based on applying simple statistical functions on values within each variable. Based-on our categorical data detection system, we provide a bench-marking experiment for the most of the existing categorical data encoders e.g. TargetEncoding, HelmertEncoding, OneHotEncoding, to find the best feature encoder that yields high model accuracy considering the produced dimensionality and the execution time.