Skip to content

[ML] Improve time series decomposition in the presence of change points #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
77f16a2
Work on segmentation of expanding window
tveasey Jul 23, 2018
ced0e1e
Bug fixes
tveasey Jul 23, 2018
5670ffa
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Jul 23, 2018
6fdf20f
Finish up segmentation
tveasey Jul 23, 2018
c2bd6fe
Wire in piecewise constant scaling segmentation into the periodicity …
tveasey Jul 30, 2018
84ee3e8
Wire piecewise linear trend in to periodicity hypothesis testing
tveasey Aug 1, 2018
05e5d2d
Bug fixes
tveasey Aug 2, 2018
b0c85c5
Update test threshold
tveasey Aug 2, 2018
1c47455
Speed up segmentation using linear models
tveasey Aug 3, 2018
e96eae3
Penalise trend segmentation in hypothesis selection. Fix remaining un…
tveasey Aug 3, 2018
12a95a9
A couple of bug fixes and knock improvements given new test behaviour
tveasey Aug 4, 2018
29a7573
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Aug 6, 2018
294f1ad
Revert experiment
tveasey Aug 6, 2018
350b9db
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Aug 17, 2018
b6cb46d
Compiler warnings and incorrect argument type
tveasey Aug 17, 2018
b9b2b86
Suppress verbose test logging
tveasey Aug 17, 2018
5f6ce4c
Numerical hardening
tveasey Aug 17, 2018
83786e6
Some bug fixes
tveasey Aug 24, 2018
14ff203
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Sep 4, 2018
5a04b22
Unused include
tveasey Sep 4, 2018
78c7cd3
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Sep 14, 2018
cc2fb5e
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Oct 2, 2018
8499551
Fix merge
tveasey Oct 2, 2018
3bbd244
More descriptive test stats member names as per review comment
tveasey Oct 2, 2018
a8b78dd
Correct comment
tveasey Oct 2, 2018
f979a64
Remove unused variable
tveasey Oct 2, 2018
f6aa320
Docs
tveasey Oct 2, 2018
e13828b
Merge branch 'master' into enhancement/improve-timeseries-decompositi…
tveasey Oct 2, 2018
1ca85fd
Some tidy ups and fix unit tests after merge
tveasey Oct 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 86 additions & 30 deletions include/maths/CPeriodicityHypothesisTests.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,18 @@ class CSeasonalTime;

//! \brief Represents the result of running the periodicity
//! hypothesis tests.
// clang-format off
class MATHS_EXPORT CPeriodicityHypothesisTestsResult : boost::equality_comparable<CPeriodicityHypothesisTestsResult,
boost::addable<CPeriodicityHypothesisTestsResult> > {
// clang-format on
class MATHS_EXPORT CPeriodicityHypothesisTestsResult
: boost::equality_comparable<CPeriodicityHypothesisTestsResult> {
public:
using TTimeTimePr = std::pair<core_t::TTime, core_t::TTime>;
using TSizeVec = std::vector<std::size_t>;

public:
//! \brief Component data.
struct MATHS_EXPORT SComponent {
SComponent();
SComponent() = default;
SComponent(const std::string& description,
bool diurnal,
bool piecewiseScaled,
core_t::TTime startOfPartition,
core_t::TTime period,
const TTimeTimePr& window,
Expand All @@ -56,41 +55,45 @@ class MATHS_EXPORT CPeriodicityHypothesisTestsResult : boost::equality_comparabl
//! An identifier for the component used by the test.
std::string s_Description;
//! True if this is a diurnal component false otherwise.
bool s_Diurnal;
bool s_Diurnal = false;
//! The segmentation of the window into intervals of constant
//! scaling.
bool s_PiecewiseScaled = false;
//! The start of the partition.
core_t::TTime s_StartOfPartition;
core_t::TTime s_StartOfPartition = 0;
//! The period of the component.
core_t::TTime s_Period;
core_t::TTime s_Period = 0;
//! The component window.
TTimeTimePr s_Window;
//! The precedence to apply to this component when
//! deciding which to keep.
double s_Precedence;
//! The precedence to apply to this component when deciding
//! which to keep.
double s_Precedence = 0.0;
};

using TComponent5Vec = core::CSmallVector<SComponent, 5>;
using TRemoveCondition = std::function<bool(const SComponent&)>;

public:
//! Check if this is equal to \p other.
bool operator==(const CPeriodicityHypothesisTestsResult& other) const;

//! Sets to the union of the periodic components present.
//!
//! \warning This only makes sense if the this and the
//! other result share the start of the partition time.
const CPeriodicityHypothesisTestsResult&
operator+=(const CPeriodicityHypothesisTestsResult& other);

//! Add a component.
void add(const std::string& description,
bool diurnal,
bool piecewiseScaled,
core_t::TTime startOfWeek,
core_t::TTime period,
const TTimeTimePr& window,
double precedence = 1.0);

//! Remove the component with \p description.
void remove(const std::string& description);
void remove(const TRemoveCondition& condition);

//! Set if this is a piecewise linear trend.
void piecewiseLinearTrend(bool value);

//! Check if this is a piecewise linear trend.
bool piecewiseLinearTrend() const;

//! Check if there are any periodic components.
bool periodic() const;
Expand All @@ -102,6 +105,9 @@ class MATHS_EXPORT CPeriodicityHypothesisTestsResult : boost::equality_comparabl
std::string print() const;

private:
//! If true then the hypothesis used a piecewise linear trend.
bool m_PiecewiseLinearTrend = false;

//! The periodic components.
TComponent5Vec m_Components;
};
Expand Down Expand Up @@ -174,14 +180,17 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
using TComponent = CPeriodicityHypothesisTestsResult::SComponent;

public:
CPeriodicityHypothesisTests();
CPeriodicityHypothesisTests() = default;
explicit CPeriodicityHypothesisTests(const CPeriodicityHypothesisTestsConfig& config);

//! Check if the test is initialized.
bool initialized() const;

//! Initialize the bucket values.
void initialize(core_t::TTime bucketLength, core_t::TTime window, core_t::TTime period);
void initialize(core_t::TTime startTime,
core_t::TTime bucketLength,
core_t::TTime window,
core_t::TTime period);

//! Add \p value at \p time.
void add(core_t::TTime time, double value, double weight = 1.0);
Expand All @@ -193,6 +202,7 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
private:
using TDoubleVec = std::vector<double>;
using TDoubleVec2Vec = core::CSmallVector<TDoubleVec, 2>;
using TSizeVec = std::vector<std::size_t>;
using TFloatMeanAccumulatorCRng = core::CVectorRange<const TFloatMeanAccumulatorVec>;
using TMinMaxAccumulator = maths::CBasicStatistics::CMinMax<core_t::TTime>;

Expand All @@ -204,6 +214,8 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
//! Check if the null hypothesis is good enough to not need an
//! alternative.
bool nullHypothesisGoodEnough() const;
//! The number of segments in the trend.
double s_TrendSegments;
//! True if a known periodic component is tested.
bool s_HasPeriod;
//! True if a known repeating partition is tested.
Expand Down Expand Up @@ -231,10 +243,14 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
double s_DF0;
//! The trend for the null hypothesis.
TDoubleVec2Vec s_T0;
//! The linear scales if any.
TDoubleVec s_Scales;
//! The partition for the null hypothesis.
TTimeTimePr2Vec s_Partition;
//! The start of the repeating partition.
core_t::TTime s_StartOfPartition;
//! The segmentation of the interval if any.
TSizeVec s_Segmentation;
};

//! \brief Manages the testing of a set of nested hypotheses.
Expand Down Expand Up @@ -267,13 +283,19 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
CNestedHypotheses& addNested(TTestFunc test);
//! Test the hypotheses.
CPeriodicityHypothesisTestsResult test(STestStats& stats) const;
//! Set if the hypothesis uses a piecewise linear trend.
void trendSegments(std::size_t segments);
//! Check if the hypothesis uses a piecewise linear trend.
std::size_t trendSegments() const;

private:
using THypothesisVec = std::vector<CNestedHypotheses>;

private:
//! The test.
TTestFunc m_Test;
//! True if the hypothesis used a piecewise linear trend.
std::size_t m_TrendSegments;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the comments here quite match the type of the member variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This used to be a bool but we actually need the number of segments in the calling code so this changed. I'll update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this commit.

//! If true always test the nested hypotheses.
bool m_AlwaysTestNested;
//! The nested hypotheses to test.
Expand Down Expand Up @@ -313,11 +335,13 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
//! Test for a daily periodic component.
CPeriodicityHypothesisTestsResult testForDaily(const TTimeTimePr2Vec& window,
const TFloatMeanAccumulatorCRng& buckets,
bool scaling,
STestStats& stats) const;

//! Test for a weekly periodic component.
CPeriodicityHypothesisTestsResult testForWeekly(const TTimeTimePr2Vec& window,
const TFloatMeanAccumulatorCRng& buckets,
bool scaling,
STestStats& stats) const;

//! Test for a weekday/end partition.
Expand All @@ -335,6 +359,7 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
//! periodicity.
CPeriodicityHypothesisTestsResult testForPeriod(const TTimeTimePr2Vec& window,
const TFloatMeanAccumulatorCRng& buckets,
bool scaling,
STestStats& stats) const;

//! Check we've seen sufficient data to test accurately.
Expand All @@ -343,7 +368,8 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {

//! Check if there are enough non-empty buckets which are repeated
//! at at least one \p period in \p buckets.
bool seenSufficientPeriodicallyPopulatedBucketsToTest(const TFloatMeanAccumulatorCRng& buckets,
template<typename CONTAINER>
bool seenSufficientPeriodicallyPopulatedBucketsToTest(const CONTAINER& buckets,
std::size_t period) const;

//! Compute various ancillary statistics for testing.
Expand Down Expand Up @@ -372,6 +398,13 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
core_t::TTime period,
STestStats& stats) const;

//! Test to see if there is significant evidence for a component
//! with period \p period which is piecewise linearly scaled.
bool testPeriodWithScaling(const TTimeTimePr2Vec& windows,
const TFloatMeanAccumulatorCRng& buckets,
core_t::TTime period,
STestStats& stats) const;

//! Test to see if there is significant evidence for a repeating
//! partition of the data into windows defined by \p partition.
bool testPartition(const TTimeTimePr2Vec& partition,
Expand All @@ -380,6 +413,29 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
double correction,
STestStats& stats) const;

//! Run the explained variance test on an alternative hypothesis.
bool testVariance(const TTimeTimePr2Vec& window,
const TFloatMeanAccumulatorVec& buckets,
core_t::TTime period,
double df1,
double v1,
STestStats& stats,
double& R,
double& meanRepeats,
double& pVariance,
const TSizeVec& segmentation = TSizeVec{}) const;

//! Run the component amplitude test on the alternative hypothesis.
bool testAmplitude(const TTimeTimePr2Vec& window,
const TFloatMeanAccumulatorVec& buckets,
core_t::TTime period,
double b,
double v,
double R,
double meanRepeats,
double pVariance,
STestStats& stats) const;

private:
//! The minimum proportion of populated buckets for which
//! the test is accurate.
Expand All @@ -392,14 +448,17 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
//! Configures the tests to run.
CPeriodicityHypothesisTestsConfig m_Config;

//! The start time of the window.
core_t::TTime m_StartTime = 0;

//! The bucketing interval.
core_t::TTime m_BucketLength;
core_t::TTime m_BucketLength = 0;

//! The window length for which to maintain bucket values.
core_t::TTime m_WindowLength;
core_t::TTime m_WindowLength = 0;

//! The specified period to test.
core_t::TTime m_Period;
core_t::TTime m_Period = 0;

//! The time range of values added to the test.
TMinMaxAccumulator m_TimeRange;
Expand All @@ -408,16 +467,13 @@ class MATHS_EXPORT CPeriodicityHypothesisTests {
TFloatMeanAccumulatorVec m_BucketValues;
};

using TFloatMeanAccumulator = CBasicStatistics::SSampleMean<CFloatStorage>::TAccumulator;
using TFloatMeanAccumulatorVec = std::vector<TFloatMeanAccumulator>;

//! Test for periodic components in \p values.
MATHS_EXPORT
CPeriodicityHypothesisTestsResult
testForPeriods(const CPeriodicityHypothesisTestsConfig& config,
core_t::TTime startTime,
core_t::TTime bucketLength,
const TFloatMeanAccumulatorVec& values);
const std::vector<CBasicStatistics::SSampleMean<CFloatStorage>::TAccumulator>& values);
}
}

Expand Down
5 changes: 4 additions & 1 deletion include/maths/CRegression.h
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,11 @@ class MATHS_EXPORT CRegression {
//! is at a premium.
//!
//! \tparam N_ The degree of the polynomial.
// clang-format off
template<std::size_t N_, typename T = CFloatStorage>
class CLeastSquaresOnline : boost::addable<CLeastSquaresOnline<N_, T>> {
class CLeastSquaresOnline : boost::addable<CLeastSquaresOnline<N_, T>,
boost::subtractable<CLeastSquaresOnline<N_, T>>> {
// clang-format on
public:
static const std::size_t N = N_ + 1;
using TArray = boost::array<double, N>;
Expand Down
2 changes: 1 addition & 1 deletion include/maths/CTimeSeriesDecomposition.h
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ class MATHS_EXPORT CTimeSeriesDecomposition : public CTimeSeriesDecompositionInt
virtual std::size_t staticSize() const;

//! Get the time shift which is being applied.
virtual core_t::TTime timeShift(void) const;
virtual core_t::TTime timeShift() const;

//! Get the seasonal components.
virtual const maths_t::TSeasonalComponentVec& seasonalComponents() const;
Expand Down
9 changes: 4 additions & 5 deletions include/maths/CTimeSeriesDecompositionDetail.h
Original file line number Diff line number Diff line change
Expand Up @@ -235,11 +235,9 @@ class MATHS_EXPORT CTimeSeriesDecompositionDetail {
using TExpandingWindowPtrAry = boost::array<TExpandingWindowPtr, 2>;

private:
//! The bucket lengths to use to test for short period components.
static const TTimeVec SHORT_BUCKET_LENGTHS;

//! The bucket lengths to use to test for long period components.
static const TTimeVec LONG_BUCKET_LENGTHS;
//! The longest bucket length at which we'll test for periodic
//! components.
static const core_t::TTime LONGEST_BUCKET_LENGTH;

private:
//! Handle \p symbol.
Expand Down Expand Up @@ -457,6 +455,7 @@ class MATHS_EXPORT CTimeSeriesDecompositionDetail {
using TSeasonalComponentPtrVec = std::vector<CSeasonalComponent*>;
using TCalendarComponentPtrVec = std::vector<CCalendarComponent*>;
using TFloatMeanAccumulator = CBasicStatistics::SSampleMean<CFloatStorage>::TAccumulator;
using TFloatMeanAccumulatorVec = std::vector<TFloatMeanAccumulator>;

//! \brief Manages the setting of the error gain when updating
//! the components with a value.
Expand Down
Loading