Skip to content

Commit

Permalink
Add DetectSeasonality as a Helper function in TimeSeries ExtensionDia…
Browse files Browse the repository at this point in the history
…log (#5231)

* create class PeriodDetectUtils

* Test period detect

* math utils

* restore file

* update license

* 1. Add DetectSeasonality as a helper method in ExtensionsCatalog,

2. Remove the MathUtils and use MedianDblAggregator (make it BestFriend)
3. Add Unit Tests

* Change SeasonalityDetector to be internal class

* 1. Introduce randomnessThreshold as an optional parameter

2. Update comments and polish SeasonalityDetector for readability.

* minor float to double type change

* fix unit tests

* address Harish's comments:

1. Change Randomness threshold to [0, 1] range as confidence internal and map to inverse normal cumulative distribution
2. Update unit tests to use sin(2pi + x)
3. Other formatting issues

* minor format update

* update comments

* minor follow up comment update

* update threshold to p value

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>
Co-authored-by: Lisa Hua <jinhua@microsoft.com>
  • Loading branch information
3 people authored Jun 22, 2020
1 parent 1c2469f commit bb13d62
Show file tree
Hide file tree
Showing 5 changed files with 463 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.TimeSeries;

namespace Samples.Dynamic
{
public static class DetectSeasonality
{
public static void Example()
{
/* Create a new ML context, for ML.NET operations. It can be used for
exception tracking and logging, as well as the source of randomness.*/
var mlContext = new MLContext();

// Create a seasonal data as input: y = sin(2 * Pi + x)
var seasonalData = Enumerable.Range(0, 100).Select(x => new TimeSeriesData(Math.Sin(2 * Math.PI + x)));

// Load the input data as a DataView.
var dataView = mlContext.Data.LoadFromEnumerable(seasonalData);

/* Two option parameters:
* seasonalityWindowSize: Default value is -1. When set to -1, use the whole input to fit model;
* when set to a positive integer, only the first windowSize number of values will be considered.
* randomnessThreshold: Randomness threshold that specifies how confidence the input values follows
* a predictable pattern recurring as seasonal data. By default, it is set as 0.99.
* The higher the threshold is set, the more strict recurring pattern the
* input values should follow to be determined as seasonal data.
*/
int period = mlContext.AnomalyDetection.DetectSeasonality(
dataView,
nameof(TimeSeriesData.Value),
seasonalityWindowSize: 40);

// Print the Seasonality Period result.
Console.WriteLine($"Seasonality Period: #{period}");
}

private class TimeSeriesData
{
public double Value;

public TimeSeriesData(double value)
{
Value = value;
}
}

}
}
3 changes: 2 additions & 1 deletion src/Microsoft.ML.Data/Transforms/NormalizeColumnDbl.cs
Original file line number Diff line number Diff line change
Expand Up @@ -597,6 +597,7 @@ internal static void GetMedianSoFar(in double num, ref double median, ref MaxHea
/// It tracks median values of non-sparse values (vCount).
/// NaNs are ignored when updating min and max.
/// </summary>
[BestFriend]
internal sealed class MedianDblAggregator : IColumnAggregator<double>
{
private MedianAggregatorUtils.MaxHeap<double> _belowMedianHeap;
Expand Down Expand Up @@ -1213,7 +1214,7 @@ private void GetResult(ref TFloat input, ref TFloat value)
}

public override NormalizingTransformer.NormalizerModelParametersBase GetNormalizerModelParams()
=> new NormalizingTransformer.BinNormalizerModelParameters<TFloat>(ImmutableArray.Create(_binUpperBounds), _den,_offset);
=> new NormalizingTransformer.BinNormalizerModelParameters<TFloat>(ImmutableArray.Create(_binUpperBounds), _den, _offset);
}

public sealed class ImplVec : BinColumnFunction
Expand Down
46 changes: 46 additions & 0 deletions src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,52 @@ public static RootCause LocalizeRootCause(this AnomalyDetectionCatalog catalog,
return dst;
}

/// <summary>
/// <para>
/// In time series data, seasonality (or periodicity) is the presence of variations that occur at specific regular intervals,
/// such as weekly, monthly, or quarterly.
/// </para>
/// <para>
/// This method detects this predictable interval (or period) by adopting techniques of fourier analysis.
/// Assuming the input values have the same time interval (e.g., sensor data collected at every second ordered by timestamps),
/// this method takes a list of time-series data, and returns the regular period for the input seasonal data,
/// if a predictable fluctuation or pattern can be found that recurs or repeats over this period throughout the input values.
/// </para>
/// <para>
/// Returns -1 if no such pattern is found, that is, the input values do not follow a seasonal fluctuation.
/// </para>
/// </summary>
/// <param name="catalog">The detect seasonality catalog.</param>
/// <param name="input">Input DataView.The data is an instance of <see cref="Microsoft.ML.IDataView"/>.</param>
/// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
/// <param name="seasonalityWindowSize">An upper bound on the number of values to be considered in the input values.
/// When set to -1, use the whole input to fit model; when set to a positive integer, only the first windowSize number
/// of values will be considered. Default value is -1.</param>
/// <param name="randomnessThreshold"><a href ="https://en.wikipedia.org/wiki/Correlogram">Randomness threshold</a>
/// that specifies how confidently the input values follow a predictable pattern recurring as seasonal data.
/// The range is between [0, 1]. By default, it is set as 0.95.
/// </param>
/// <returns>The regular interval for the input as seasonal data, otherwise return -1.</returns>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[LocalizeRootCause](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/TimeSeries/DetectSeasonality.cs)]
/// ]]>
/// </format>
/// </example>
public static int DetectSeasonality(
this AnomalyDetectionCatalog catalog,
IDataView input,
string inputColumnName,
int seasonalityWindowSize = -1,
double randomnessThreshold = 0.95)
=> new SeasonalityDetector().DetectSeasonality(
CatalogUtils.GetEnvironment(catalog),
input,
inputColumnName,
seasonalityWindowSize,
randomnessThreshold);

private static void CheckRootCauseInput(IHostEnvironment host, RootCauseLocalizationInput src)
{
host.CheckUserArg(src.Slices.Count >= 1, nameof(src.Slices), "Must has more than one item");
Expand Down
Loading

0 comments on commit bb13d62

Please sign in to comment.