Skip to content

Changing imputation for nulls in DateToUnitCircleTransformer #556

Open
@michaelweilsalesforce

Description

@michaelweilsalesforce

Problem
When using DateToUnitCircleTransformer, null dates are replaced with (0,0), which is not on the unit circle.
Also with the example of DateToUnitCircleTransformer with TimePeriod HourOfDay, dates with format MM-DD-YYYY are converted to MM-DD-YYYY 00h00m00s, hence will have a circular representation of (1, 0).
We would expect the null values being (1, 0) as well.

Solution
Using (1, 0) instead of (0, 0) for null default value.

Alternatives
Alternatives do not only concern this transformer but the other vectorizer that can return the mode as imputation technique.
Instead of getting the mode, randomly select an existing non null value so that the distribution of the feature is not changed.
However, this remains difficult :

  • DateToUnitCircleTransformer is not an estimator
  • As an estimator, you would store as a fitted param all the distinct non null values of the dataset.

Additional context
This is in the context where we have this HourOfDay circular representation of a MM-DD-YYYY 00h00m00s date not being thrown out by SanityChecker because of Variance being not 0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions