Skip to content

[ML] Improve time series decomposition in the presence of change points #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
77f16a2
Work on segmentation of expanding window
tveasey Jul 23, 2018
ced0e1e
Bug fixes
tveasey Jul 23, 2018
5670ffa
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Jul 23, 2018
6fdf20f
Finish up segmentation
tveasey Jul 23, 2018
c2bd6fe
Wire in piecewise constant scaling segmentation into the periodicity …
tveasey Jul 30, 2018
84ee3e8
Wire piecewise linear trend in to periodicity hypothesis testing
tveasey Aug 1, 2018
05e5d2d
Bug fixes
tveasey Aug 2, 2018
b0c85c5
Update test threshold
tveasey Aug 2, 2018
1c47455
Speed up segmentation using linear models
tveasey Aug 3, 2018
e96eae3
Penalise trend segmentation in hypothesis selection. Fix remaining un…
tveasey Aug 3, 2018
12a95a9
A couple of bug fixes and knock improvements given new test behaviour
tveasey Aug 4, 2018
29a7573
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Aug 6, 2018
294f1ad
Revert experiment
tveasey Aug 6, 2018
350b9db
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Aug 17, 2018
b6cb46d
Compiler warnings and incorrect argument type
tveasey Aug 17, 2018
b9b2b86
Suppress verbose test logging
tveasey Aug 17, 2018
5f6ce4c
Numerical hardening
tveasey Aug 17, 2018
83786e6
Some bug fixes
tveasey Aug 24, 2018
14ff203
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Sep 4, 2018
5a04b22
Unused include
tveasey Sep 4, 2018
78c7cd3
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Sep 14, 2018
cc2fb5e
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Oct 2, 2018
8499551
Fix merge
tveasey Oct 2, 2018
3bbd244
More descriptive test stats member names as per review comment
tveasey Oct 2, 2018
a8b78dd
Correct comment
tveasey Oct 2, 2018
f979a64
Remove unused variable
tveasey Oct 2, 2018
f6aa320
Docs
tveasey Oct 2, 2018
e13828b
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Oct 2, 2018
1ca85fd
Some tidy ups and fix unit tests after merge
tveasey Oct 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions docs/CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -40,16 +40,20 @@

Perform anomaly detection on features derived from multiple bucket values to improve robustness
of detection with respect to misconfigured bucket lengths and improve detection of long lasting
anomalies. (See {pull}175[#175].)
anomalies. (See {ml-pull}175[#175].)

Increased independence of anomaly scores across partitions ({pull}182[182])
Support decomposing a time series into a piecewise linear trend and with piecewise constant
scaling of the periodic components. This extends our decomposition functionality to handle the
same types of change points that our modelling capabilities do. (See {ml-pull}198[198].)

Increased independence of anomaly scores across partitions (See {ml-pull}182[182].)

Avoid potential false positives at model start up when first detecting new components of the time
series decomposition. (See {pull}218[218].)
series decomposition. (See {ml-pull}218[218].)

=== Bug Fixes

Fix cause of "Bad density value..." log errors whilst forecasting. ({pull}207[207])
Fix cause of "Bad density value..." log errors whilst forecasting. ({ml-pull}207[207])

Fix incorrectly missing influencers when the influence field is one of the detector's partitioning
fields and the bucket is empty. ({pull}219[#219])
Expand Down
126 changes: 91 additions & 35 deletions include/maths/CPeriodicityHypothesisTests.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,18 @@ class CSeasonalTime;

//! \brief Represents the result of running the periodicity
//! hypothesis tests.
// clang-format off
class MATHS_EXPORT CPeriodicityHypothesisTestsResult : boost::equality_comparable<CPeriodicityHypothesisTestsResult,
boost::addable<CPeriodicityHypothesisTestsResult> > {
// clang-format on
class MATHS_EXPORT CPeriodicityHypothesisTestsResult
: boost::equality_comparable<CPeriodicityHypothesisTestsResult> {
public:
using TTimeTimePr = std::pair<core_t::TTime, core_t::TTime>;
using TSizeVec = std::vector<std::size_t>;

public:
//! \brief Component data.
struct MATHS_EXPORT SComponent {
SComponent();
SComponent() = default;
SComponent(const std::string& description,
bool diurnal,
bool piecewiseScaled,
core_t::TTime startOfPartition,
core_t::TTime period,
const TTimeTimePr& window,
Expand All @@ -56,41 +55,45 @@ class MATHS_EXPORT CPeriodicityHypothesisTestsResult : boost::equality_comparabl
//! An identifier for the component used by the test.
std::string s_Description;
//! True if this is a diurnal component false otherwise.
bool s_Diurnal;
bool s_Diurnal = false;
//! The segmentation of the window into intervals of constant
//! scaling.
bool s_PiecewiseScaled = false;
//! The start of the partition.
core_t::TTime s_StartOfPartition;
core_t::TTime s_StartOfPartition = 0;
//! The period of the component.
core_t::TTime s_Period;
core_t::TTime s_Period = 0;
//! The component window.
TTimeTimePr s_Window;
//! The precedence to apply to this component when
//! deciding which to keep.
double s_Precedence;
//! The precedence to apply to this component when deciding
//! which to keep.
double s_Precedence = 0.0;
};

using TComponent5Vec = core::CSmallVector<SComponent, 5>;
using TRemoveCondition = std::function<bool(const SComponent&)>;

public:
//! Check if this is equal to \p other.
bool operator==(const CPeriodicityHypothesisTestsResult& other) const;

//! Sets to the union of the periodic components present.
//!
//! \warning This only makes sense if the this and the
//! other result share the start of the partition time.
const CPeriodicityHypothesisTestsResult&
operator+=(const CPeriodicityHypothesisTestsResult& other);

//! Add a component.
void add(const std::string& description,
bool diurnal,
bool piecewiseScaled,
core_t::TTime startOfWeek,
core_t::TTime period,
const TTimeTimePr& window,
double precedence = 1.0);

//! Remove the component with \p description.
void remove(const std::string& description);
void remove(const TRemoveCondition& condition);

//! Set if this is a piecewise linear trend.
void piecewiseLinearTrend(bool value);

//! Check if this is a piecewise linear trend.
bool piecewiseLinearTrend() const;

//! Check if there are any periodic components.
bool periodic() const;
Expand All @@ -102,6 +105,9 @@ class MATHS_EXPORT CPeriodicityHypothesisTestsResult : boost::equality_comparabl
std::string print() const;

private:
//! If true then the hypothesis used a piecewise linear trend.
bool m_PiecewiseLinearTrend = false;

//! The periodic components.
TComponent5Vec m_Components;
};
Expand Down Expand Up @@ -174,14 +180,17 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
using TComponent = CPeriodicityHypothesisTestsResult::SComponent;

public:
CPeriodicityHypothesisTests();
CPeriodicityHypothesisTests() = default;
explicit CPeriodicityHypothesisTests(const CPeriodicityHypothesisTestsConfig& config);

//! Check if the test is initialized.
bool initialized() const;

//! Initialize the bucket values.
void initialize(core_t::TTime bucketLength, core_t::TTime window, core_t::TTime period);
void initialize(core_t::TTime startTime,
core_t::TTime bucketLength,
core_t::TTime window,
core_t::TTime period);

//! Add \p value at \p time.
void add(core_t::TTime time, double value, double weight = 1.0);
Expand All @@ -193,6 +202,7 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
private:
using TDoubleVec = std::vector<double>;
using TDoubleVec2Vec = core::CSmallVector<TDoubleVec, 2>;
using TSizeVec = std::vector<std::size_t>;
using TFloatMeanAccumulatorCRng = core::CVectorRange<const TFloatMeanAccumulatorVec>;
using TMinMaxAccumulator = maths::CBasicStatistics::CMinMax<core_t::TTime>;

Expand All @@ -204,23 +214,25 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
//! Check if the null hypothesis is good enough to not need an
//! alternative.
bool nullHypothesisGoodEnough() const;
//! The number of segments in the trend.
double s_TrendSegments;
//! True if a known periodic component is tested.
bool s_HasPeriod;
//! True if a known repeating partition is tested.
bool s_HasPartition;
//! The maximum variance to accept the alternative hypothesis.
double s_Vt;
double s_VarianceThreshold;
//! The minimum amplitude to accept the alternative hypothesis.
double s_At;
double s_AmplitudeThreshold;
//! The minimum autocorrelation to accept the alternative
//! hypothesis.
double s_Rt;
double s_AutocorrelationThreshold;
//! The data range.
double s_Range;
//! The number of buckets with at least one measurement.
double s_B;
double s_NonEmptyBuckets;
//! The average number of measurements per bucket value.
double s_M;
double s_MeasurementsPerBucket;
//! The null hypothesis periodic components.
CPeriodicityHypothesisTestsResult s_H0;
//! The variance estimate of H0.
Expand All @@ -231,10 +243,14 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
double s_DF0;
//! The trend for the null hypothesis.
TDoubleVec2Vec s_T0;
//! The linear scales if any.
TDoubleVec s_Scales;
//! The partition for the null hypothesis.
TTimeTimePr2Vec s_Partition;
//! The start of the repeating partition.
core_t::TTime s_StartOfPartition;
//! The segmentation of the interval if any.
TSizeVec s_Segmentation;
};

//! \brief Manages the testing of a set of nested hypotheses.
Expand Down Expand Up @@ -267,13 +283,19 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
CNestedHypotheses& addNested(TTestFunc test);
//! Test the hypotheses.
CPeriodicityHypothesisTestsResult test(STestStats& stats) const;
//! Set if the hypothesis uses a piecewise linear trend.
void trendSegments(std::size_t segments);
//! Check if the hypothesis uses a piecewise linear trend.
std::size_t trendSegments() const;

private:
using THypothesisVec = std::vector<CNestedHypotheses>;

private:
//! The test.
TTestFunc m_Test;
//! The number of segments in the trend.
std::size_t m_TrendSegments;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the comments here quite match the type of the member variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This used to be a bool but we actually need the number of segments in the calling code so this changed. I'll update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this commit.

//! If true always test the nested hypotheses.
bool m_AlwaysTestNested;
//! The nested hypotheses to test.
Expand Down Expand Up @@ -313,11 +335,13 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
//! Test for a daily periodic component.
CPeriodicityHypothesisTestsResult testForDaily(const TTimeTimePr2Vec& window,
const TFloatMeanAccumulatorCRng& buckets,
bool scaling,
STestStats& stats) const;

//! Test for a weekly periodic component.
CPeriodicityHypothesisTestsResult testForWeekly(const TTimeTimePr2Vec& window,
const TFloatMeanAccumulatorCRng& buckets,
bool scaling,
STestStats& stats) const;

//! Test for a weekday/end partition.
Expand All @@ -335,6 +359,7 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
//! periodicity.
CPeriodicityHypothesisTestsResult testForPeriod(const TTimeTimePr2Vec& window,
const TFloatMeanAccumulatorCRng& buckets,
bool scaling,
STestStats& stats) const;

//! Check we've seen sufficient data to test accurately.
Expand All @@ -343,7 +368,8 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {

//! Check if there are enough non-empty buckets which are repeated
//! at at least one \p period in \p buckets.
bool seenSufficientPeriodicallyPopulatedBucketsToTest(const TFloatMeanAccumulatorCRng& buckets,
template<typename CONTAINER>
bool seenSufficientPeriodicallyPopulatedBucketsToTest(const CONTAINER& buckets,
std::size_t period) const;

//! Compute various ancillary statistics for testing.
Expand Down Expand Up @@ -372,6 +398,13 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
core_t::TTime period,
STestStats& stats) const;

//! Test to see if there is significant evidence for a component
//! with period \p period which is piecewise linearly scaled.
bool testPeriodWithScaling(const TTimeTimePr2Vec& windows,
const TFloatMeanAccumulatorCRng& buckets,
core_t::TTime period,
STestStats& stats) const;

//! Test to see if there is significant evidence for a repeating
//! partition of the data into windows defined by \p partition.
bool testPartition(const TTimeTimePr2Vec& partition,
Expand All @@ -380,6 +413,29 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
double correction,
STestStats& stats) const;

//! Run the explained variance test on an alternative hypothesis.
bool testVariance(const TTimeTimePr2Vec& window,
const TFloatMeanAccumulatorVec& buckets,
core_t::TTime period,
double df1,
double v1,
STestStats& stats,
double& R,
double& meanRepeats,
double& pVariance,
const TSizeVec& segmentation = TSizeVec{}) const;

//! Run the component amplitude test on the alternative hypothesis.
bool testAmplitude(const TTimeTimePr2Vec& window,
const TFloatMeanAccumulatorVec& buckets,
core_t::TTime period,
double b,
double v,
double R,
double meanRepeats,
double pVariance,
STestStats& stats) const;

private:
//! The minimum proportion of populated buckets for which
//! the test is accurate.
Expand All @@ -392,14 +448,17 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
//! Configures the tests to run.
CPeriodicityHypothesisTestsConfig m_Config;

//! The start time of the window.
core_t::TTime m_StartTime = 0;

//! The bucketing interval.
core_t::TTime m_BucketLength;
core_t::TTime m_BucketLength = 0;

//! The window length for which to maintain bucket values.
core_t::TTime m_WindowLength;
core_t::TTime m_WindowLength = 0;

//! The specified period to test.
core_t::TTime m_Period;
core_t::TTime m_Period = 0;

//! The time range of values added to the test.
TMinMaxAccumulator m_TimeRange;
Expand All @@ -408,16 +467,13 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
TFloatMeanAccumulatorVec m_BucketValues;
};

using TFloatMeanAccumulator = CBasicStatistics::SSampleMean<CFloatStorage>::TAccumulator;
using TFloatMeanAccumulatorVec = std::vector<TFloatMeanAccumulator>;

//! Test for periodic components in \p values.
MATHS_EXPORT
CPeriodicityHypothesisTestsResult
testForPeriods(const CPeriodicityHypothesisTestsConfig& config,
core_t::TTime startTime,
core_t::TTime bucketLength,
const TFloatMeanAccumulatorVec& values);
const std::vector<CBasicStatistics::SSampleMean<CFloatStorage>::TAccumulator>& values);
}
}

Expand Down
5 changes: 4 additions & 1 deletion include/maths/CRegression.h
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,11 @@ class MATHS_EXPORT CRegression {
//! is at a premium.
//!
//! \tparam N_ The degree of the polynomial.
// clang-format off
template<std::size_t N_, typename T = CFloatStorage>
class CLeastSquaresOnline : boost::addable<CLeastSquaresOnline<N_, T>> {
class CLeastSquaresOnline : boost::addable<CLeastSquaresOnline<N_, T>,
boost::subtractable<CLeastSquaresOnline<N_, T>>> {
// clang-format on
public:
static const std::size_t N = N_ + 1;
using TArray = boost::array<double, N>;
Expand Down
4 changes: 2 additions & 2 deletions include/maths/CTimeSeriesDecomposition.h
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,8 @@ class MATHS_EXPORT CTimeSeriesDecomposition : public CTimeSeriesDecompositionInt
//! Get the value of the time series at \p time.
//!
//! \param[in] time The time of interest.
//! \param[in] confidence The symmetric confidence interval for the prediction
//! the baseline as a percentage.
//! \param[in] confidence The symmetric confidence interval for the
//! prediction the baseline as a percentage.
//! \param[in] components The components to include in the baseline.
virtual maths_t::TDoubleDoublePr value(core_t::TTime time,
double confidence = 0.0,
Expand Down
Loading