-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add root cause localization transformer #4925
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @suxi-ms , thank you for your PR! Can you please briefly describe the changes you've made according to our contribution guide?
@suxi-ms, please do following thing then we can start code review:
|
private static IRowMapper Create(IHostEnvironment env, ModelLoadContext ctx, DataViewSchema inputSchema) | ||
=> Create(env, ctx).MakeRowMapper(inputSchema); | ||
|
||
private protected override void SaveModel(ModelSaveContext ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SaveModel [](start = 40, length = 9)
Don't you need to save (and load) _beta
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SaveModel [](start = 40, length = 9)
Don't you need to save (and load)
_beta
?
Thank you for the suggestion, I have added changeds to save _beta #Resolved
} | ||
|
||
[Fact] | ||
public void RootCauseLocalizationWithDT() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RootCauseLocalizationWithDT [](start = 20, length = 27)
Please also test serialization/deserialization. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RootCauseLocalizationWithDT [](start = 20, length = 27)
Please also test serialization/deserialization.
Could you help explain more, I did't exactly understand the suggestion. Which part needs serialization/deserialization? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the DTRootCauseLocalizationTransformer
class is methods that save the model to a file, and that load the model back from a file. This test should exercise these methods, by using the ml.Model.Save
and ml.Model.Load
APIs, and checking that the values returned by the deserialized model are the same as the ones returned by the original model.
In reply to: 399045379 [](ancestors = 399045379)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the
DTRootCauseLocalizationTransformer
class is methods that save the model to a file, and that load the model back from a file. This test should exercise these methods, by using theml.Model.Save
andml.Model.Load
APIs, and checking that the values returned by the deserialized model are the same as the ones returned by the original model.In reply to: 399045379 [](ancestors = 399045379)
Do I need to save the model, I find in the document, the save menthod is used for training related data? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was unclear. You should save the model, not the data. Using this API:
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Model/ModelOperationsCatalog.cs#L112
You then need to load the model using this API:
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Model/ModelOperationsCatalog.cs#L211
and then run the loaded model on the data, and verify that it gives the same outputs as the model before serialization.
Here is a test that contains an example of how to save and load the model:
https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Transformers/NormalizerTests.cs#L961
(the saving and loading is done in lines 969 and 970).
In reply to: 399795392 [](ancestors = 399795392)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the
DTRootCauseLocalizationTransformer
class is methods that save the model to a file, and that load the model back from a file. This test should exercise these methods, by using theml.Model.Save
andml.Model.Load
APIs, and checking that the values returned by the deserialized model are the same as the ones returned by the original model.In reply to: 399045379 [](ancestors = 399045379)
I have tried to use this logic to save and load model,
var modelPath = "temp.zip";
//Save model to a file
ml.Model.Save(model, data.Schema, modelPath);
//Load model from a file
ITransformer serializedModel;
using (var file = File.OpenRead(modelPath))
{
serializedModel = ml.Model.Load(file, out var serializedSchema);
TestCommon.CheckSameSchemas(data.Schema, serializedSchema);
}
However, the Save()
method will hit an exception because Save.IsColumnSavable(type)
returns false. The root cause is we have a customized dataviewtype, which is neither VectorDataViewType
nor PrimitiveDataViewType
, the check failed so no columns is added for save. Do you have any guidance on this? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the
DTRootCauseLocalizationTransformer
class is methods that save the model to a file, and that load the model back from a file. This test should exercise these methods, by using theml.Model.Save
andml.Model.Load
APIs, and checking that the values returned by the deserialized model are the same as the ones returned by the original model.
In reply to: 399045379 [](ancestors = 399045379)I have tried to use this logic to save and load model,
var modelPath = "temp.zip"; //Save model to a file ml.Model.Save(model, data.Schema, modelPath); //Load model from a file ITransformer serializedModel; using (var file = File.OpenRead(modelPath)) { serializedModel = ml.Model.Load(file, out var serializedSchema); TestCommon.CheckSameSchemas(data.Schema, serializedSchema); }
However, the
Save()
method will hit an exception becauseSave.IsColumnSavable(type)
returns false. The root cause is we have a customized dataviewtype, which is neitherVectorDataViewType
norPrimitiveDataViewType
, the check failed so no columns is added for save. Do you have any guidance on this?
Has updated according to your suggestion in email discussion #Resolved
var src = default(RootCauseLocalizationInput); | ||
var getSrc = input.GetGetter<RootCauseLocalizationInput>(input.Schema[ColMapNewToOld[iinfo]]); | ||
|
||
disposer = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disposer [](start = 16, length = 8)
I think this is not necessary. This is used by components that use resources that are IDisposable
, for example, ImageResizer
that uses a Bitmap
for its computations. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disposer [](start = 16, length = 8)
I think this is not necessary. This is used by components that use resources that are
IDisposable
, for example,ImageResizer
that uses aBitmap
for its computations.
Have changed, please help review whether the change is right #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also assign
null
instead.In reply to: 399045482 [](ancestors = 399045482)
Have updated to null #Resolved
/// <param name="columns">The name of the columns (first item of the tuple), and the name of the resulting output column (second item of the tuple).</param> | ||
/// <param name="beta">The weight for generating score in output result.</param> | ||
[BestFriend] | ||
internal DTRootCauseLocalizationEstimator(IHostEnvironment env, double beta = Defaults.Beta, params (string outputColumnName, string inputColumnName)[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[] [](start = 157, length = 2)
Since this transformer is non-trainable (meaning, the estimator doesn't need to look at the data in order to create a transformer), there is no advantage to supporting multiple input/output columns here - if there is ever a scenario where there are multiple input columns, the user can create an estimator chain instead. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[] [](start = 157, length = 2)
Since this transformer is non-trainable (meaning, the estimator doesn't need to look at the data in order to create a transformer), there is no advantage to supporting multiple input/output columns here - if there is ever a scenario where there are multiple input columns, the user can create an estimator chain instead.
I find that ImageLoading and TextNormalizing are also non-trainable, however they defines this columns parameter, could you help explain when to use it and when not? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[] [](start = 157, length = 2)
Since this transformer is non-trainable (meaning, the estimator doesn't need to look at the data in order to create a transformer), there is no advantage to supporting multiple input/output columns here - if there is ever a scenario where there are multiple input columns, the user can create an estimator chain instead.
I find that ImageLoading and TextNormalizing are also non-trainable, however they defines this columns parameter, could you help explain when to use it and when not?
I have updated the columns to outputColumnand inputColumn, could you help check whether they are right? #Resolved
_beta = beta; | ||
} | ||
|
||
// Factory method for SignatureDataTransform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SignatureDataTransform [](start = 30, length = 22)
This (and the matching attribute) is only needed if you want to use this component from the command line. If this is the intention, please add a test for running this using maml. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SignatureDataTransform [](start = 30, length = 22)
This (and the matching attribute) is only needed if you want to use this component from the command line. If this is the intention, please add a test for running this using maml.
I have no idea about the maml, could you help provide more information? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SignatureDataTransform [](start = 30, length = 22)
This (and the matching attribute) is only needed if you want to use this component from the command line. If this is the intention, please add a test for running this using maml.
I have no idea about the maml, could you help provide more information?
maml is not needed and I have removed this method #Resolved
if (!(col.ItemType is RootCauseLocalizationInputDataViewType) || col.Kind != SchemaShape.Column.VectorKind.Scalar) | ||
throw Host.ExceptSchemaMismatch(nameof(inputSchema), "input", colInfo.inputColumnName, new RootCauseLocalizationInputDataViewType().ToString(), col.GetTypeString()); | ||
|
||
result[colInfo.outputColumnName] = new SchemaShape.Column(colInfo.outputColumnName, col.Kind, col.ItemType, col.IsKey, col.Annotations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col.Kind, col.ItemType, col.IsKey, col.Annotations [](start = 100, length = 50)
These should be the values for the output column. The Kind
should be SchemaShape.Column.VectorKind.Scalar
(this should match col.Kind
since you are checking above that it is a scalar), but what should the ItemType
be? Also, seems like col.IsKey
is always false, and are there any annotations that need to be passed from the input to the output? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col.Kind, col.ItemType, col.IsKey, col.Annotations [](start = 100, length = 50)
These should be the values for the output column. The
Kind
should beSchemaShape.Column.VectorKind.Scalar
(this should matchcol.Kind
since you are checking above that it is a scalar), but what should theItemType
be? Also, seems likecol.IsKey
is always false, and are there any annotations that need to be passed from the input to the output?
made some udpates, could you help review whether they are right? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this check removed - col.Kind != SchemaShape.Column.VectorKind.Scalar
? Isn't it correct?
In reply to: 399045854 [](ancestors = 399045854)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha
Why was this check removed -
col.Kind != SchemaShape.Column.VectorKind.Scalar
? Isn't it correct?In reply to: 399045854 [](ancestors = 399045854)
Have updated. #Resolved
@@ -146,6 +146,23 @@ public static class TimeSeriesCatalog | |||
int windowSize=64, int backAddWindowSize=5, int lookaheadWindowSize=5, int averageingWindowSize=3, int judgementWindowSize=21, double threshold=0.3) | |||
=> new SrCnnAnomalyEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, windowSize, backAddWindowSize, lookaheadWindowSize, averageingWindowSize, judgementWindowSize, threshold, inputColumnName); | |||
|
|||
/// <summary> | |||
/// Create <see cref="DTRootCauseLocalizationEstimator"/>, which localizes root causess using decision tree algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
causess [](start = 88, length = 7)
typo #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
causess [](start = 88, length = 7)
typo
Thank you for pointing it out, has fixed it. #Resolved
return subDim; | ||
} | ||
|
||
protected List<RootCauseItem> LocalizeRootCauseByDimension(PointTree anomalyTree, PointTree pointTree, Dictionary<string, Object> anomalyDimension, List<string> aggDims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Please consider something along the lines of the following for the GetSubDim function.
return new Dictionary<string, object>(keyList.Select(dim => new KeyValuePair<string, object>(dim, dimension[dim])));
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Please consider something along the lines of the following for the GetSubDim function.
return new Dictionary<string, object>(keyList.Select(dim => new KeyValuePair<string, object>(dim, dimension[dim])));
updated #Resolved
} | ||
|
||
private double Log2(double val) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider using the following attribute for the function:
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
``` #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider using the following attribute for the function:
[MethodImplAttribute(MethodImplOptions.AggressiveInlining)]
updated #Resolved
if (dimension.Key.AnomalyDis.Count > 1) | ||
{ | ||
if (best == null || (!Double.IsNaN(valueRatioMap[best]) && (best.AnomalyDis.Count != 1 && (isLeavesLevel ? valueRatioMap[best].CompareTo(dimension.Value) <= 0 : valueRatioMap[best].CompareTo(dimension.Value) >= 0)))) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: For readability, can you please avoid the long line and break up the conditions on separate lines? Same comment for line 487. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: For readability, can you please avoid the long line and break up the conditions on separate lines? Same comment for line 487.
made some updates #Resolved
|
||
//check beta | ||
if (beta < 0 || beta > 1) { | ||
host.CheckUserArg(beta >= 0 && beta <= 1, nameof(beta), "Must be in [0,1]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this if check. The CheckUserArg is performing the check. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this if check. The CheckUserArg is performing the check.
will update #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,49 @@ | |||
At Mircosoft, we develop a decision tree based root cause localization method which helps to find out the root causes for an anomaly incident at a specific timestamp incrementally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, "Microsoft." Also, it's a bit nonstandard to use present tense for "we develop." I would expect "we have developed" if the work is completed or "we maintain" if the work is ongoing.
At Mircosoft, we develop a decision tree based root cause localization method which helps to find out the root causes for an anomaly incident at a specific timestamp incrementally. | ||
|
||
## Multi-Dimensional Root Cause Localization | ||
It's a common case that one measure is collected with many dimensions (*e.g.*, Province, ISP) whose values are categorical(*e.g.*, Beijing or Shanghai for dimension Province). When a measure's value deviates from its expected value, this measure encounters anomalies. In such case, operators would like to localize the root cause dimension combinations rapidly and accurately. Multi-dimensional root cause localization is critical to troubleshoot and mitigate such case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use "users" instead of "operators."
|
||
### Decision Tree | ||
|
||
[Decision tree](https://en.wikipedia.org/wiki/Decision_tree) algorithm chooses the highest information gain to split or construct a decision tree. We use it to choose the dimension which contributes the most to the anomaly. Following are some concepts used in decision tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's non-standard to omit articles here. Try something like "The Decision Tree algorithm chooses..." and "Below are some concepts used in decision trees"
|
||
Where $Ent(D^v)$ is the entropy of set points in D for which dimension $a$ is equal to $v$, $|D|$ is the total number of points in dataset $D$. $|D^V|$ is the total number of points in dataset $D$ for which dimension $a$ is equal to $v$. | ||
|
||
For all aggregated dimensions, we calculate the information for each dimension. The greater the reduction in this uncertainty, the more information is gained about D from dimension $a$. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D should be in dollar signs?
{ | ||
public static class LocalizeRootCause | ||
{ | ||
private static string AGG_SYMBOL = "##SUM##"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is AGG_SYMBOL for here? I notice that on line 19, you have both AGG_SYMBOL and AggregateType.Sum, and that some of the points have AGG_SYMBOL passed in instead of strings like "DC1."
Can you add a few comments explaining what AGG_SYMBOL is and why it is used?
@@ -143,9 +147,53 @@ public static class TimeSeriesCatalog | |||
/// </format> | |||
/// </example> | |||
public static SrCnnAnomalyEstimator DetectAnomalyBySrCnn(this TransformsCatalog catalog, string outputColumnName, string inputColumnName, | |||
int windowSize=64, int backAddWindowSize=5, int lookaheadWindowSize=5, int averageingWindowSize=3, int judgementWindowSize=21, double threshold=0.3) | |||
int windowSize = 64, int backAddWindowSize = 5, int lookaheadWindowSize = 5, int averageingWindowSize = 3, int judgementWindowSize = 21, double threshold = 0.3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's spelled "averaging" (without an "e"). If this is an easy change to make, would be a good way to keep our code base looking high quality.
/// </summary> | ||
/// <param name="catalog">The anomaly detection catalog.</param> | ||
/// <param name="src">Root cause's input. The data is an instance of <see cref="Microsoft.ML.TimeSeries.RootCauseLocalizationInput"/>.</param> | ||
/// <param name="beta">Beta is a weight parameter for user to choose. It is used when score is calculated for each root cause item. The range of beta should be in [0,1]. For a larger beta, root cause point which has a large difference between value and expected value will get a high score. On the contrary, for a small beta, root cause items which has a high relative change will get a high score.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say "on the contrary," but the two scenarios you describe don't seem to be opposites. One is about "relative change" and one is about difference between expected value and actual value. Can you make this explanation clearer?
Additionally, you mention "score," but it's not clear what score is, exactly.
The goal of this pull request is to provide a decision tree based algorithm to localize the root cause of an incident on multi-dimensional time series on a specified timestamp.