Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deseasonality in SrCnnEntireAnomalyDetect #5202

Merged
merged 52 commits into from
Jun 29, 2020

Conversation

guinao
Copy link
Contributor

@guinao guinao commented Jun 3, 2020

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

Description

In time series data, seasonality is the presence of variations that occur at specific regular intervals less than a year, such as weekly, monthly, or quarterly. In time series anomaly detection, it is essential to remove the effect of seasonality before detecting anomalies.

In this PR, we add the de-seasonality process in SrCnnEntireAnomalyDetect.

  • A new SrCnnEntireAnomalyDetect API is introduced.
        /// <summary>
        /// Create <see cref="SrCnnEntireAnomalyDetector"/>, which detects timeseries anomalies for entire input using SRCNN algorithm.
        /// </summary>
        /// <param name="catalog">The AnomalyDetectionCatalog.</param>
        /// <param name="input">Input DataView.</param>
        /// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>.
        /// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="options.DetectMode"/>.</param>
        /// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
        /// <param name="options">Defines the settings of the load operation.</param>
        /// <example>
        /// <format type="text/markdown">
        /// <![CDATA[
        /// [!code-csharp[DetectEntireAnomalyBySrCnn](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/TimeSeries/DetectEntireAnomalyBySrCnn.cs)]
        /// ]]>
        /// </format>
        /// </example>
        public static IDataView DetectEntireAnomalyBySrCnn(this AnomalyDetectionCatalog catalog, IDataView input, string outputColumnName, string inputColumnName, SrCnnEntireAnomalyDetectorOptions options)
            => new SrCnnEntireAnomalyDetector(CatalogUtils.GetEnvironment(catalog), input, outputColumnName, inputColumnName, options);

In the options parameter, if Period is set to a positive integer, de-seasonality will be enabled before detection.

Three de-seasonality methods are introduced.

    public enum SrCnnDeseasonalityMode
    {
        /// <summary>
        /// In this mode, the stl decompose algorithm is used to de-seasonality.
        /// </summary>
        Stl = 0,

        /// <summary>
        /// In this mode, the mean value of points in the same position in a period is substracted to de-seasonality.
        /// </summary>
        Mean = 1,

        /// <summary>
        /// In this mode, the median value of points in the same position in a period is substracted to de-seasonality.
        /// </summary>
        Median = 2
    }

Changes in this PR includes.

  1. Add new constructor for SrCnnEntireAnomalyDetector class that accepts an options class as parameter.
  2. Implement three de-seasonality functions, namely the STL algorithm, the mean method and the median method. The STL algorithm can be referred to at http://www.nniiem.ru/file/news/2016/stl-statistical-model.pdf. The mean/median method substracts the mean/median value of points belong to the same bucket as de-seasonality value.
  3. Add unit tests in TimeSeriesDirectApi file.

@guinao guinao requested a review from a team as a code owner June 3, 2020 10:14
@guinao
Copy link
Contributor Author

guinao commented Jun 3, 2020

Not used?

I don't think this is used (or should be used)?

Removed this file. #Resolved

@guinao
Copy link
Contributor Author

guinao commented Jun 3, 2020

Not used?

Removed this file #Resolved

Microsoft.ML.sln Outdated Show resolved Hide resolved
@guinao
Copy link
Contributor Author

guinao commented Jun 4, 2020

@harishsk is added to the review. #Closed

@guinao
Copy link
Contributor Author

guinao commented Jun 4, 2020

@conhua is added to the review. #Closed

@guinao
Copy link
Contributor Author

guinao commented Jun 4, 2020

@jinhua is added to the review. #Closed


Contracts.CheckParam(weights.Count == _length, nameof(weights));
if (weights.Count != _length)
throw Contracts.Except("the weight vector is not equal length to the data points");
Copy link
Contributor

@Lynx1820 Lynx1820 Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "The length of the weight vector is not equal to the length of the data points." #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks.


In reply to: 442531436 [](ancestors = 442531436)

@guinao
Copy link
Contributor Author

guinao commented Jun 22, 2020

I updated a description for this PR in the first comment.


In reply to: 646231552 [](ancestors = 646231552)

else // if (deseasonalityMode == SrCnnDeseasonalityMode.Median)
{
_deseasonalityFunction = new MedianDeseasonality();
}
Copy link

@yaeldekel yaeldekel Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be changed to a switch statement. #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the default case, instead of writing deseasonalityMode == SrCnnDeseasonalityMode.Median you can add

Contracts.Assert(deseasonalityMode == SrCnnDeseasonalityMode.Median);

(that way, if in the future a value is added to the SrCnnDeseasonalityMode enum, it will be easier to detect that this code needs to be updated).


In reply to: 443404193 [](ancestors = 443404193)

for (int i = 0; i < values.Length; ++i)
{
results[i][3] = values[i] - residual[i];
}
Copy link

@yaeldekel yaeldekel Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call GetExpectedValuePeriod instead? #Resolved

[Theory, CombinatorialData]
public void TestSrCnnAnomalyDetectorWithSeasonalData(
[CombinatorialValues(SrCnnDeseasonalityMode.Stl, SrCnnDeseasonalityMode.Mean, SrCnnDeseasonalityMode.Median)] SrCnnDeseasonalityMode mode
)
Copy link

@yaeldekel yaeldekel Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

) [](start = 8, length = 1)

nit: move this to the line above. #Resolved

[CombinatorialValues(SrCnnDeseasonalityMode.Stl, SrCnnDeseasonalityMode.Mean, SrCnnDeseasonalityMode.Median)] SrCnnDeseasonalityMode mode
)
{
var ml = new MLContext(1);
Copy link

@yaeldekel yaeldekel Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLContext [](start = 25, length = 9)

The base class has an MLContext field. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you point out the MLContext field? As I find in this class, other tests are creating MLContext by themselves.


In reply to: 443415149 [](ancestors = 443415149)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was wrong. The MLContext is not in BaseTestClass, it is in BaseTestBaseline, which is derived from BaseTestClass.


In reply to: 443532513 [](ancestors = 443532513,443415149)

[CombinatorialValues(SrCnnDeseasonalityMode.Stl, SrCnnDeseasonalityMode.Mean, SrCnnDeseasonalityMode.Median)] SrCnnDeseasonalityMode mode
)
{
var ml = new MLContext(1);
Copy link

@yaeldekel yaeldekel Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLContext [](start = 25, length = 9)

Use the MLContext of the base class. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment, the base class don't have this field.


In reply to: 443415470 [](ancestors = 443415470)

@yaeldekel
Copy link

yaeldekel commented Jun 22, 2020

using System.Collections.Generic;

Add header. #Resolved


Refers to: src/Microsoft.ML.TimeSeries/STL/FastLoess.cs:1 in 816479a. [](commit_id = 816479a, deletion_comment = False)

@yaeldekel
Copy link

using System.Collections.Generic;

For the rest of the new files as well.


In reply to: 647384981 [](ancestors = 647384981)


Refers to: src/Microsoft.ML.TimeSeries/STL/FastLoess.cs:1 in 816479a. [](commit_id = 816479a, deletion_comment = False)

}

/// <summary>
/// The estimated y values. This is the very cool smoothing method.
Copy link

@yaeldekel yaeldekel Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the very cool smoothing method. [](start = 36, length = 39)

What does this comment mean? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, original, it's refer to loess(local weighted regression)


In reply to: 443960923 [](ancestors = 443960923)

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@guinao guinao closed this Jun 26, 2020
@guinao guinao reopened this Jun 26, 2020
Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@harishsk harishsk merged commit 33f5f32 into dotnet:master Jun 29, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants