Skip to content

[ML] Improve robustness w.r.t. outliers of detection and initialisation of seasonal components #90

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
=== Enhancements

Improve and use periodic boundary condition for seasonal component modeling ({pull}84[#84])
Improve robustness w.r.t. outliers of detection and initialisation of seasonal components ({pull}90[#90])

=== Bug Fixes

Expand Down
6 changes: 3 additions & 3 deletions include/maths/CPeriodicityHypothesisTests.h
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,8 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
CPeriodicityHypothesisTestsResult s_H0;
//! The variance estimate of H0.
double s_V0;
//! The autocorrelation estimate of H0.
double s_R0;
//! The degrees of freedom in the variance estimate of H0.
double s_DF0;
//! The trend for the null hypothesis.
Expand Down Expand Up @@ -361,9 +363,7 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
//! Condition \p buckets assuming the null hypothesis is true.
//!
//! This removes any trend associated with the null hypothesis.
void conditionOnHypothesis(const TTimeTimePr2Vec& windows,
const STestStats& stats,
TFloatMeanAccumulatorVec& buckets) const;
void conditionOnHypothesis(const STestStats& stats, TFloatMeanAccumulatorVec& buckets) const;

//! Test to see if there is significant evidence for a component
//! with period \p period.
Expand Down
14 changes: 14 additions & 0 deletions include/maths/CTimeSeriesDecompositionDetail.h
Original file line number Diff line number Diff line change
Expand Up @@ -609,6 +609,20 @@ class MATHS_EXPORT CTimeSeriesDecompositionDetail {
maths_t::TCalendarComponentVec& components,
TComponentErrorsVec& errors) const;

//! Reweight the outlier values in \p values.
//!
//! These are the values with largest error w.r.t. \p predictor.
void reweightOutliers(core_t::TTime startTime,
core_t::TTime dt,
TPredictor predictor,
TFloatMeanAccumulatorVec& values) const;

//! Fit the trend component \p component to \p values.
void fit(core_t::TTime startTime,
core_t::TTime dt,
const TFloatMeanAccumulatorVec& values,
CTrendComponent& trend) const;

//! Clear all component error statistics.
void clearComponentErrors();

Expand Down
13 changes: 5 additions & 8 deletions include/maths/CTools.h
Original file line number Diff line number Diff line change
Expand Up @@ -665,19 +665,16 @@ class MATHS_EXPORT CTools : private core::CNonInstantiatable {
//! Sigmoid function of \p p.
static double sigmoid(double p) { return 1.0 / (1.0 + 1.0 / p); }

//! A smooth Heaviside function centred at one.
//! The logistic function.
//!
//! This is a smooth version of the Heaviside function implemented
//! as \f$sigmoid\left(\frac{sign (x - 1)}{wb}\right)\f$ normalized
//! to the range [0, 1], where \f$b\f$ is \p boundary and \f$w\f$
//! is \p width. Note, if \p sign is one this is a step up and if
//! it is -1 it is a step down.
//! i.e. \f$sigmoid\left(\frac{sign (x - x0)}{width}\right)\f$.
//!
//! \param[in] x The argument.
//! \param[in] width The step width.
//! \param[in] x0 The centre of the step.
//! \param[in] sign Determines whether it's a step up or down.
static double smoothHeaviside(double x, double width, double sign = 1.0) {
return sigmoid(std::exp(sign * (x - 1.0) / width)) / sigmoid(std::exp(1.0 / width));
static double logisticFunction(double x, double width, double x0 = 0.0, double sign = 1.0) {
return sigmoid(std::exp(std::copysign(1.0, sign) * (x - x0) / width));
}
};
}
Expand Down
29 changes: 24 additions & 5 deletions include/maths/Constants.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
#include <maths/ImportExport.h>
#include <maths/MathsTypes.h>

#include <cmath>

namespace ml {
namespace maths {

Expand Down Expand Up @@ -50,22 +52,39 @@ const double LOG_NORMAL_OFFSET_MARGIN{1.0};
//! reduce the prediction error variance and still be worthwhile
//! modeling. We have different thresholds because we have inductive
//! bias for particular types of components.
const double SIGNIFICANT_VARIANCE_REDUCTION[]{0.7, 0.5};
const double COMPONENT_SIGNIFICANT_VARIANCE_REDUCTION[]{0.6, 0.4};

//! The minimum repeated amplitude of a seasonal component, as a
//! multiple of error standard deviation, to be worthwhile modeling.
//! We have different thresholds because we have inductive bias for
//! particular types of components.
const double SIGNIFICANT_AMPLITUDE[]{1.0, 2.0};
const double SEASONAL_SIGNIFICANT_AMPLITUDE[]{1.0, 2.0};

//! The minimum autocorrelation of a seasonal component to be
//! worthwhile modeling. We have different thresholds because we
//! have inductive bias for particular types of components.
const double SIGNIFICANT_AUTOCORRELATION[]{0.5, 0.7};
const double SEASONAL_SIGNIFICANT_AUTOCORRELATION[]{0.5, 0.6};

//! The fraction of values which are treated as outliers when testing
//! for and initializing a seasonal component.
const double SEASONAL_OUTLIER_FRACTION{0.1};

//! The minimum multiplier of the mean inlier fraction difference
//! (from a periodic pattern) to constitute an outlier when testing
//! for and initializing a seasonal component.
const double SEASONAL_OUTLIER_DIFFERENCE_THRESHOLD{3.0};

//! The maximum significance of a test statistic to choose to model
//! The weight to assign outliers when testing for and initializing
//! a seasonal component.
const double SEASONAL_OUTLIER_WEIGHT{0.1};

//! The significance of a test statistic to choose to model
//! a trend decomposition component.
const double MAXIMUM_SIGNIFICANCE{0.001};
const double COMPONENT_STATISTICALLY_SIGNIFICANT{0.001};

//! The log of COMPONENT_STATISTICALLY_SIGNIFICANT.
const double LOG_COMPONENT_STATISTICALLY_SIGNIFICANCE{
std::log(COMPONENT_STATISTICALLY_SIGNIFICANT)};

//! The minimum variance scale for which the likelihood function
//! can be accurately adjusted. For smaller scales errors are
Expand Down
Loading