-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add anomaly localization implementation #103
Conversation
.../main/java/org/opensearch/ml/engine/algorithms/anomalylocalization/AnomalyLocalizerImpl.java
Show resolved
Hide resolved
* Information about aggregate, time, etc to localize. | ||
*/ | ||
@Data | ||
public class Input { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming this class as AnomalyLocalizationInput
and implementing org.opensearch.ml.common.parameter.Input
?
We should put input/output in common
package.
Same comment for org.opensearch.ml.engine.algorithms.anomalylocalization.Output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Implementing the input/output/executor interface, and all integration for ml-common framework, will be in a separate pr following all steps for integration.
- The input and output are specific to this solution, thus in the package specifically named with the solution. I am not sure it should be separate from its own module/package and be moved to a common package as it, unlike generic Input/Output interfaces, is not common to other solutions.
|
||
protected static int SKETCH_THRESHOLD = 10_000; | ||
|
||
private Counter counter = new Hashmap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming class Hashmap
as HashMapCounter
or ExactCounter
? Hashmap
looks confusing as java has HashMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i renamed it to HashMapCounter
Map<List<String>, Double> hashmap = ((Hashmap) counter).getKeyValues(); | ||
boolean hasNegative = hashmap.values().stream().anyMatch(v -> v < 0); | ||
Counter newCounter; | ||
if (hasNegative) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments to explain when the count will be negative ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added a comment that says the aggregate value for a key, say average(value), can be negative.
*/ | ||
@Override | ||
@SneakyThrows | ||
public void getLocalizationResults(Input input, ActionListener<Output> listener) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add final
keyword whenever it's applicable.
e.g.,(final Input input, final ActionListener<Output> listener)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would suggest no for a few reasons.
- it doesn't improve code readability, just imagine lots of final keywords in almost every line of code. good implementation is correct and the way to verify is via testing, (think of other general programming languages such as javascript, python). specific language constructs help, more general better approach is static code analysis and code style checks and other automated tools (such as code guru).
- There are abundant examples or convention in opensearch.
https://github.com/opensearch-project/OpenSearch/blob/996d33adb22ac6a523962166f3677f8dc6ac9c1f/server/src/main/java/org/opensearch/action/search/SearchTransportService.java#L109
https://github.com/opensearch-project/OpenSearch/blob/996d33adb22ac6a523962166f3677f8dc6ac9c1f/server/src/main/java/org/opensearch/search/internal/ReaderContext.java#L60
} | ||
|
||
private boolean isResultComplete(Output.Result result) { | ||
return result.getBuckets().stream().allMatch(e -> e.getCompleted() == null || e.getCompleted().get() == true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curiosity question: why we consider the result is complete when getCompleted()
filed is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment that when a bucket does not need localization, such as the base bucket or the bucket has no change over the base bucket.
|
||
protected static int SKETCH_THRESHOLD = 10_000; | ||
|
||
private Counter counter = new Hashmap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
import lombok.extern.log4j.Log4j2; | ||
|
||
/** | ||
* Count sketch implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding more description or some link of Sketch, so others can learn how sketch works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added links to the references in the java doc.
/** | ||
* Constructor. | ||
*/ | ||
public CountSketch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CountMinSketch
and CountSketch
have some duplicate code. Is it possible to refactor to remove the duplicate, like adding one filed to indicate whether the CountSketch
is min
or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the two algorithms share some similarity but not quite same. there is not too much to refactor from/to, which would however make the algorithms much less clear to understand.
.../main/java/org/opensearch/ml/engine/algorithms/anomalylocalization/AnomalyLocalizerImpl.java
Show resolved
Hide resolved
long end = Math.max(input.getEndTime(), anomalyStart + input.getMinTimeInterval()); | ||
int numBuckets = Math.min((int) ((end - anomalyStart) / input.getMinTimeInterval()), MAX_TIME_BUCKETS - 1); | ||
long bucketInterval = (end - anomalyStart) / numBuckets; | ||
long start = Math.min(input.getStartTime(), anomalyStart - bucketInterval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this line before line 335 and calculate numBuckets
as
Math.min((int) ((end - start) / input.getMinTimeInterval()), MAX_TIME_BUCKETS - 1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be anomalyStart because localization is aimed at anomalous time, which is after the given anomalyStart, thus anomaly localization. the start is used for the base bucket.
@Data | ||
public class Input { | ||
|
||
private final String indexName; // name of the data index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to support index pattern like abc*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FromAnomalyLocalizerImpl
, index name used in search request directly, so index pattern is ok. Maybe change the comment to show index pattern also supported?
/** | ||
* Implementation of localization. | ||
* <p> | ||
* This localization finds the largest contributors to changes in aggregated data over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about add more javadoc for these key steps in different methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i added javadocs to the submethods. let me know if some important ones are missing or unclear.
How about keeping each commit in PR stage? So PR reviewer can see what changed between commits easily rather than reread the whole PR. |
see the second commit for update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the change!
Signed-off-by: lai <57818076+wnbts@users.noreply.github.com>
Signed-off-by: lai <57818076+wnbts@users.noreply.github.com>
Signed-off-by: lai laijiang@amazon.com
Description
This change contains the implementation of the core anomaly localization solution. It does not have integration with ml-commons, which will be in separate pr from after this.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.