Description
Problem
When using DateToUnitCircleTransformer, null dates are replaced with (0,0), which is not on the unit circle.
Also with the example of DateToUnitCircleTransformer with TimePeriod HourOfDay, dates with format MM-DD-YYYY
are converted to MM-DD-YYYY 00h00m00s
, hence will have a circular representation of (1, 0).
We would expect the null values being (1, 0) as well.
Solution
Using (1, 0) instead of (0, 0) for null default value.
Alternatives
Alternatives do not only concern this transformer but the other vectorizer
that can return the mode as imputation technique.
Instead of getting the mode, randomly select an existing non null value so that the distribution of the feature is not changed.
However, this remains difficult :
- DateToUnitCircleTransformer is not an estimator
- As an estimator, you would store as a fitted param all the distinct non null values of the dataset.
Additional context
This is in the context where we have this HourOfDay circular representation of a MM-DD-YYYY 00h00m00s
date not being thrown out by SanityChecker because of Variance being not 0.