Skip to content

Commit d5211a7

Browse files
authored
[ML] Improve robustness w.r.t. outliers of detection and initialisation of seasonal components (#90)
This makes two principle changes: 1) iteratively reweights outliers w.r.t. the seasonal component under test and for initialisation. These are defined as a fraction of values with highest residual w.r.t. the component's predictions. 2) switches marginal test decisions for decomposition components to use logistic regression on top of the various factors, i.e. variance reduction, autocorrelation, number of periods of data observed, etc.
1 parent 37b8734 commit d5211a7

17 files changed

+908
-340
lines changed

docs/CHANGELOG.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
=== Enhancements
2929

3030
Improve and use periodic boundary condition for seasonal component modeling ({pull}84[#84])
31+
Improve robustness w.r.t. outliers of detection and initialisation of seasonal components ({pull}90[#90])
3132

3233
=== Bug Fixes
3334

include/maths/CPeriodicityHypothesisTests.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,8 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
225225
CPeriodicityHypothesisTestsResult s_H0;
226226
//! The variance estimate of H0.
227227
double s_V0;
228+
//! The autocorrelation estimate of H0.
229+
double s_R0;
228230
//! The degrees of freedom in the variance estimate of H0.
229231
double s_DF0;
230232
//! The trend for the null hypothesis.
@@ -361,9 +363,7 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
361363
//! Condition \p buckets assuming the null hypothesis is true.
362364
//!
363365
//! This removes any trend associated with the null hypothesis.
364-
void conditionOnHypothesis(const TTimeTimePr2Vec& windows,
365-
const STestStats& stats,
366-
TFloatMeanAccumulatorVec& buckets) const;
366+
void conditionOnHypothesis(const STestStats& stats, TFloatMeanAccumulatorVec& buckets) const;
367367

368368
//! Test to see if there is significant evidence for a component
369369
//! with period \p period.

include/maths/CTimeSeriesDecompositionDetail.h

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -609,6 +609,20 @@ class MATHS_EXPORT CTimeSeriesDecompositionDetail {
609609
maths_t::TCalendarComponentVec& components,
610610
TComponentErrorsVec& errors) const;
611611

612+
//! Reweight the outlier values in \p values.
613+
//!
614+
//! These are the values with largest error w.r.t. \p predictor.
615+
void reweightOutliers(core_t::TTime startTime,
616+
core_t::TTime dt,
617+
TPredictor predictor,
618+
TFloatMeanAccumulatorVec& values) const;
619+
620+
//! Fit the trend component \p component to \p values.
621+
void fit(core_t::TTime startTime,
622+
core_t::TTime dt,
623+
const TFloatMeanAccumulatorVec& values,
624+
CTrendComponent& trend) const;
625+
612626
//! Clear all component error statistics.
613627
void clearComponentErrors();
614628

include/maths/CTools.h

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -665,19 +665,16 @@ class MATHS_EXPORT CTools : private core::CNonInstantiatable {
665665
//! Sigmoid function of \p p.
666666
static double sigmoid(double p) { return 1.0 / (1.0 + 1.0 / p); }
667667

668-
//! A smooth Heaviside function centred at one.
668+
//! The logistic function.
669669
//!
670-
//! This is a smooth version of the Heaviside function implemented
671-
//! as \f$sigmoid\left(\frac{sign (x - 1)}{wb}\right)\f$ normalized
672-
//! to the range [0, 1], where \f$b\f$ is \p boundary and \f$w\f$
673-
//! is \p width. Note, if \p sign is one this is a step up and if
674-
//! it is -1 it is a step down.
670+
//! i.e. \f$sigmoid\left(\frac{sign (x - x0)}{width}\right)\f$.
675671
//!
676672
//! \param[in] x The argument.
677673
//! \param[in] width The step width.
674+
//! \param[in] x0 The centre of the step.
678675
//! \param[in] sign Determines whether it's a step up or down.
679-
static double smoothHeaviside(double x, double width, double sign = 1.0) {
680-
return sigmoid(std::exp(sign * (x - 1.0) / width)) / sigmoid(std::exp(1.0 / width));
676+
static double logisticFunction(double x, double width, double x0 = 0.0, double sign = 1.0) {
677+
return sigmoid(std::exp(std::copysign(1.0, sign) * (x - x0) / width));
681678
}
682679
};
683680
}

include/maths/Constants.h

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
#include <maths/ImportExport.h>
1313
#include <maths/MathsTypes.h>
1414

15+
#include <cmath>
16+
1517
namespace ml {
1618
namespace maths {
1719

@@ -50,22 +52,39 @@ const double LOG_NORMAL_OFFSET_MARGIN{1.0};
5052
//! reduce the prediction error variance and still be worthwhile
5153
//! modeling. We have different thresholds because we have inductive
5254
//! bias for particular types of components.
53-
const double SIGNIFICANT_VARIANCE_REDUCTION[]{0.7, 0.5};
55+
const double COMPONENT_SIGNIFICANT_VARIANCE_REDUCTION[]{0.6, 0.4};
5456

5557
//! The minimum repeated amplitude of a seasonal component, as a
5658
//! multiple of error standard deviation, to be worthwhile modeling.
5759
//! We have different thresholds because we have inductive bias for
5860
//! particular types of components.
59-
const double SIGNIFICANT_AMPLITUDE[]{1.0, 2.0};
61+
const double SEASONAL_SIGNIFICANT_AMPLITUDE[]{1.0, 2.0};
6062

6163
//! The minimum autocorrelation of a seasonal component to be
6264
//! worthwhile modeling. We have different thresholds because we
6365
//! have inductive bias for particular types of components.
64-
const double SIGNIFICANT_AUTOCORRELATION[]{0.5, 0.7};
66+
const double SEASONAL_SIGNIFICANT_AUTOCORRELATION[]{0.5, 0.6};
67+
68+
//! The fraction of values which are treated as outliers when testing
69+
//! for and initializing a seasonal component.
70+
const double SEASONAL_OUTLIER_FRACTION{0.1};
71+
72+
//! The minimum multiplier of the mean inlier fraction difference
73+
//! (from a periodic pattern) to constitute an outlier when testing
74+
//! for and initializing a seasonal component.
75+
const double SEASONAL_OUTLIER_DIFFERENCE_THRESHOLD{3.0};
6576

66-
//! The maximum significance of a test statistic to choose to model
77+
//! The weight to assign outliers when testing for and initializing
78+
//! a seasonal component.
79+
const double SEASONAL_OUTLIER_WEIGHT{0.1};
80+
81+
//! The significance of a test statistic to choose to model
6782
//! a trend decomposition component.
68-
const double MAXIMUM_SIGNIFICANCE{0.001};
83+
const double COMPONENT_STATISTICALLY_SIGNIFICANT{0.001};
84+
85+
//! The log of COMPONENT_STATISTICALLY_SIGNIFICANT.
86+
const double LOG_COMPONENT_STATISTICALLY_SIGNIFICANCE{
87+
std::log(COMPONENT_STATISTICALLY_SIGNIFICANT)};
6988

7089
//! The minimum variance scale for which the likelihood function
7190
//! can be accurately adjusted. For smaller scales errors are

0 commit comments

Comments
 (0)