Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add anomaly localization implementation #103

Merged
merged 2 commits into from
Dec 30, 2021
Merged

add anomaly localization implementation #103

merged 2 commits into from
Dec 30, 2021

Conversation

wnbts
Copy link
Contributor

@wnbts wnbts commented Dec 6, 2021

Signed-off-by: lai laijiang@amazon.com

Description

This change contains the implementation of the core anomaly localization solution. It does not have integration with ml-commons, which will be in separate pr from after this.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@wnbts wnbts requested a review from a team December 6, 2021 21:19
* Information about aggregate, time, etc to localize.
*/
@Data
public class Input {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming this class as AnomalyLocalizationInput and implementing org.opensearch.ml.common.parameter.Input?

We should put input/output in common package.

Same comment for org.opensearch.ml.engine.algorithms.anomalylocalization.Output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Implementing the input/output/executor interface, and all integration for ml-common framework, will be in a separate pr following all steps for integration.
  2. The input and output are specific to this solution, thus in the package specifically named with the solution. I am not sure it should be separate from its own module/package and be moved to a common package as it, unlike generic Input/Output interfaces, is not common to other solutions.


protected static int SKETCH_THRESHOLD = 10_000;

private Counter counter = new Hashmap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming class Hashmap as HashMapCounter or ExactCounter ? Hashmap looks confusing as java has HashMap

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i renamed it to HashMapCounter

Map<List<String>, Double> hashmap = ((Hashmap) counter).getKeyValues();
boolean hasNegative = hashmap.values().stream().anyMatch(v -> v < 0);
Counter newCounter;
if (hasNegative) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments to explain when the count will be negative ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added a comment that says the aggregate value for a key, say average(value), can be negative.

*/
@Override
@SneakyThrows
public void getLocalizationResults(Input input, ActionListener<Output> listener) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add final keyword whenever it's applicable.
e.g.,(final Input input, final ActionListener<Output> listener)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest no for a few reasons.

  1. it doesn't improve code readability, just imagine lots of final keywords in almost every line of code. good implementation is correct and the way to verify is via testing, (think of other general programming languages such as javascript, python). specific language constructs help, more general better approach is static code analysis and code style checks and other automated tools (such as code guru).
  2. There are abundant examples or convention in opensearch.
    https://github.com/opensearch-project/OpenSearch/blob/996d33adb22ac6a523962166f3677f8dc6ac9c1f/server/src/main/java/org/opensearch/action/search/SearchTransportService.java#L109
    https://github.com/opensearch-project/OpenSearch/blob/996d33adb22ac6a523962166f3677f8dc6ac9c1f/server/src/main/java/org/opensearch/search/internal/ReaderContext.java#L60

}

private boolean isResultComplete(Output.Result result) {
return result.getBuckets().stream().allMatch(e -> e.getCompleted() == null || e.getCompleted().get() == true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curiosity question: why we consider the result is complete when getCompleted() filed is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment that when a bucket does not need localization, such as the base bucket or the bucket has no change over the base bucket.


protected static int SKETCH_THRESHOLD = 10_000;

private Counter counter = new Hashmap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

import lombok.extern.log4j.Log4j2;

/**
* Count sketch implementation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding more description or some link of Sketch, so others can learn how sketch works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added links to the references in the java doc.

/**
* Constructor.
*/
public CountSketch() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CountMinSketch and CountSketch have some duplicate code. Is it possible to refactor to remove the duplicate, like adding one filed to indicate whether the CountSketch is min or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two algorithms share some similarity but not quite same. there is not too much to refactor from/to, which would however make the algorithms much less clear to understand.

long end = Math.max(input.getEndTime(), anomalyStart + input.getMinTimeInterval());
int numBuckets = Math.min((int) ((end - anomalyStart) / input.getMinTimeInterval()), MAX_TIME_BUCKETS - 1);
long bucketInterval = (end - anomalyStart) / numBuckets;
long start = Math.min(input.getStartTime(), anomalyStart - bucketInterval);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this line before line 335 and calculate numBuckets as

Math.min((int) ((end - start) / input.getMinTimeInterval()), MAX_TIME_BUCKETS - 1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be anomalyStart because localization is aimed at anomalous time, which is after the given anomalyStart, thus anomaly localization. the start is used for the base bucket.

@Data
public class Input {

private final String indexName; // name of the data index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to support index pattern like abc* ?

Copy link
Collaborator

@ylwu-amzn ylwu-amzn Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FromAnomalyLocalizerImpl, index name used in search request directly, so index pattern is ok. Maybe change the comment to show index pattern also supported?

/**
* Implementation of localization.
* <p>
* This localization finds the largest contributors to changes in aggregated data over time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add more javadoc for these key steps in different methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i added javadocs to the submethods. let me know if some important ones are missing or unclear.

@ylwu-amzn
Copy link
Collaborator

How about keeping each commit in PR stage? So PR reviewer can see what changed between commits easily rather than reread the whole PR.

@wnbts
Copy link
Contributor Author

wnbts commented Dec 9, 2021

How about keeping each commit in PR stage? So PR reviewer can see what changed between commits easily rather than reread the whole PR.

see the second commit for update

ylwu-amzn
ylwu-amzn previously approved these changes Dec 15, 2021
Copy link
Collaborator

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the change!

jackiehanyang
jackiehanyang previously approved these changes Dec 30, 2021
Signed-off-by: lai <57818076+wnbts@users.noreply.github.com>
Signed-off-by: lai <57818076+wnbts@users.noreply.github.com>
@wnbts wnbts dismissed stale reviews from jackiehanyang and ylwu-amzn via e99f6aa December 30, 2021 22:21
@wnbts wnbts merged commit 0b36612 into opensearch-project:main Dec 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants