Skip to content

[ML] Label anomalies with multi_bucket_impact #34233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 4, 2018

Conversation

edsavage
Copy link
Contributor

@edsavage edsavage commented Oct 2, 2018

Add the multi_bucket_impact field to record results.

Add the multi_bucket_impact field to record results.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested a renaming change but I think IntelliJ's refactoring functionality (right click -> refactor -> rename) should make light work of it. It has the option to automatically rename arguments and accessors.

@@ -48,6 +48,7 @@
* Result fields (all detector types)
*/
public static final ParseField PROBABILITY = new ParseField("probability");
public static final ParseField IMPACT = new ParseField("multi_bucket_impact");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more future proof if the field was MULTI_BUCKET_IMPACT rather than just IMPACT. (You never know if we might need to add some other type of impact in the future.)

Same in the server-side file.

@@ -117,6 +119,7 @@
private final String jobId;
private int detectorIndex;
private double probability;
private double impact;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, I think this should be called multiBucketImpact, and the accessors should also be renamed to match.

Also, I think it should be an object of type Double so that it can be null. If people get results from old versions the field won't exist and having it as a double will set it to 0.0 in these cases.

Same in the server-side file.

@@ -399,6 +411,7 @@ public boolean equals(Object other) {
&& this.detectorIndex == that.detectorIndex
&& this.bucketSpan == that.bucketSpan
&& this.probability == that.probability
&& this.impact == that.impact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need changing to Objects.equals() for type Double.

Same in the server-side file.

@@ -164,6 +167,7 @@ public AnomalyRecord(StreamInput in) throws IOException {
jobId = in.readString();
detectorIndex = in.readInt();
probability = in.readDouble();
impact = in.readDouble();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this backwards compatible it needs to be:

    if (in.version().onOrAfter(Version.V_6_5_0)) {
        impact = in.readOptionalDouble();
    }

@@ -198,6 +202,7 @@ public void writeTo(StreamOutput out) throws IOException {
out.writeString(jobId);
out.writeInt(detectorIndex);
out.writeDouble(probability);
out.writeDouble(impact);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this backwards compatible it needs to be:

    if (out.version().onOrAfter(Version.V_6_5_0)) {
        out.writeOptionalDouble(impact);
    }

@@ -247,6 +252,7 @@ XContentBuilder innerToXContent(XContentBuilder builder, Params params) throws I
builder.field(Job.ID.getPreferredName(), jobId);
builder.field(Result.RESULT_TYPE.getPreferredName(), RESULT_TYPE_VALUE);
builder.field(PROBABILITY.getPreferredName(), probability);
builder.field(IMPACT.getPreferredName(), impact);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the type is changed to Double this should be wrapped in a null check so we write nothing if it's null.

@@ -38,6 +38,7 @@ public static AnomalyRecord createTestInstance(String jobId) {
anomalyRecord.setActual(Collections.singletonList(randomDouble()));
anomalyRecord.setTypical(Collections.singletonList(randomDouble()));
anomalyRecord.setProbability(randomDouble());
anomalyRecord.setImpact(randomDouble());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm null works once the type is changed to Double, surround this line with if (randomBoolean()) {.

Same for the server-side file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do - thanks for the quick review @droberts195

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as long as the CI goes green.

If there's a failure it's most likely related to the BWC aspects.

@edsavage edsavage merged commit 577261e into elastic:master Oct 4, 2018
edsavage added a commit that referenced this pull request Oct 4, 2018
* [ML] Label anomalies with  multi_bucket_impact

Add the multi_bucket_impact field to record results.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 4, 2018
* master: (25 commits)
  HLRC: ML Adding get datafeed stats API (elastic#34271)
  Small fixes to the HLRC watcher documentation. (elastic#34306)
  Tasks: Document that status is not semvered (elastic#34270)
  HLRC: ML Add preview datafeed api (elastic#34284)
  [CI] Fix bogus ScheduleWithFixedDelayTests.testRunnableRunsAtMostOnceAfterCancellation
  Fix error in documentation for activete watch
  SCRIPTING: Terms set query expression (elastic#33856)
  Logging: Drop remaining Settings log ctor (elastic#34149)
  [ML] Remove unused last_data_time member from Job (elastic#34262)
  Docs: Allow skipping response assertions (elastic#34240)
  HLRC: Add activate watch action (elastic#33988)
  [Security] Multi Index Expression alias wildcard exclusion (elastic#34144)
  [ML] Label anomalies with  multi_bucket_impact (elastic#34233)
  Document smtp.ssl.trust configuration option (elastic#34275)
  Support PKCS#11 tokens as keystores and truststores  (elastic#34063)
  Fix sporadic failure in NestedObjectMapperTests
  [Authz] Allow update settings action for system user (elastic#34030)
  Replace version with reader cache key in IndicesRequestCache (elastic#34189)
  [TESTS] Set SO_LINGER and SO_REUSEADDR on the mock socket (elastic#34211)
  Security: upgrade unboundid ldapsdk to 4.0.8 (elastic#34247)
  ...
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 4, 2018
* rename-ccr-stats: (25 commits)
  HLRC: ML Adding get datafeed stats API (elastic#34271)
  Small fixes to the HLRC watcher documentation. (elastic#34306)
  Tasks: Document that status is not semvered (elastic#34270)
  HLRC: ML Add preview datafeed api (elastic#34284)
  [CI] Fix bogus ScheduleWithFixedDelayTests.testRunnableRunsAtMostOnceAfterCancellation
  Fix error in documentation for activete watch
  SCRIPTING: Terms set query expression (elastic#33856)
  Logging: Drop remaining Settings log ctor (elastic#34149)
  [ML] Remove unused last_data_time member from Job (elastic#34262)
  Docs: Allow skipping response assertions (elastic#34240)
  HLRC: Add activate watch action (elastic#33988)
  [Security] Multi Index Expression alias wildcard exclusion (elastic#34144)
  [ML] Label anomalies with  multi_bucket_impact (elastic#34233)
  Document smtp.ssl.trust configuration option (elastic#34275)
  Support PKCS#11 tokens as keystores and truststores  (elastic#34063)
  Fix sporadic failure in NestedObjectMapperTests
  [Authz] Allow update settings action for system user (elastic#34030)
  Replace version with reader cache key in IndicesRequestCache (elastic#34189)
  [TESTS] Set SO_LINGER and SO_REUSEADDR on the mock socket (elastic#34211)
  Security: upgrade unboundid ldapsdk to 4.0.8 (elastic#34247)
  ...
@edsavage edsavage deleted the anomaly_labelling branch October 5, 2018 09:00
kcm pushed a commit that referenced this pull request Oct 30, 2018
* [ML] Label anomalies with  multi_bucket_impact

Add the multi_bucket_impact field to record results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants