Skip to content

[ML] Add include categorical filter type to detector rules #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 9, 2018
Merged

[ML] Add include categorical filter type to detector rules #27

merged 3 commits into from
Apr 9, 2018

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Mar 27, 2018

Detector rules have a E_Categorical condition that applies to text fields, when used in conjunction with the action E_FilterResults the effect is to filter the results i.e. don't generate results for these field values

This PR introduces the inverse functionality i.e. generate results only for those field values in the filter list.

I renamed the enum E_Categorical to E_Match but left the JSON parsing code as is for backwards compatibility. I'm not entirely happy with the naming any suggestions are welcome or perhaps this is best left for the anticipated rewrite of rules

@sophiec20
Copy link

Is this not still a E_Categorical condition, except it's type would be include or exclude.

Alternatively, if done at that level, how about E_CategoricalIncludeList and E_CategoricalExcludeList?

@davidkyle davidkyle changed the title [ML] Add whitelist condition type to detector rules [ML] Add include categorical filter type to detector rules Mar 27, 2018
@davidkyle
Copy link
Member Author

Alternatively, if done at that level, how about E_CategoricalIncludeList and E_CategoricalExcludeList?

👍 It's an improvement

@sophiec20 sophiec20 added :ml and removed :ml labels Mar 28, 2018
Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good, but I'm +1 on renaming the conditions as per Sophie's suggestion and making the rule specification match, since I don't think we need to preserve backwards compatibility at the moment.

@@ -35,6 +35,7 @@ const std::string TARGET_FIELD_NAME("target_field_name");
const std::string TARGET_FIELD_VALUE("target_field_value");
const std::string TYPE("type");
const std::string CATEGORICAL("categorical");
const std::string NOT_MATCH("not_match");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename both these strings to be consistent, i.e. categorical_include and categorical_exclude or something similar. There is going to be a break to backwards compatibility of rule syntax anyway and as it is I think this naming is confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realised backwards compatibility isn't an issue here we can replace the categorical string with anything we want as the job config is created in x-pack java on the same node the job runs on. C++ and Java are upgraded in step so on a v7 node there will be v7 java writing config v7 autodetect can read. The BWC problems are in x-pack.

Copy link
Contributor

@tveasey tveasey Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered whether mixed version clusters in rolling upgrades would cause a problem, but I guess in these cases do we ensure only compatible versions talk to one another. In any case, I think we only have one user of rules at this point except one (who needs to update their rules configuration anyway) so any BWC is surely moot.

condition.fieldName(attributeFieldName);
condition.valueFilter(valueFilter);
CDetectionRule rule;
rule.addCondition(condition);

bool isCategoricalType = CRuleCondition::E_Match == conditionType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per Sophie's suggestion, I think renaming this, and subsequent examples, to something like isCategoricalInclude would make this clearer.

@davidkyle
Copy link
Member Author

davidkyle commented Mar 28, 2018

The rules have a condition (gt, lte, etc) and an action (filter results) I think using include/exclude confuses the condition with the action. If E_CategoricalInclude is used and the condition is true when the pattern matches then when used with filter_results it would have the effect of excluding the results. Otherwise E_CategoricalInclude could be defined to return false if the pattern matches and that doesn't make much sense either.

The best I can come up with is categorical_match and categorical_complement which at least are technically correct if a little prosaic

@tveasey
Copy link
Contributor

tveasey commented Mar 28, 2018

Another option would be to introduce a new action filter_complement, or include_results and exclude_results, rather than a new condition. This is actually slightly more powerful than introducing a categorical only condition, since we could apply it to metric conditions as well.

@davidkyle
Copy link
Member Author

filter_complement would be a larger code change due to the way the code is structured see https://github.com/elastic/ml-cpp/blob/master/lib/model/CAnomalyDetectorModel.cc#L564 where the action is passed as the parameter.

We have <, <=, >, >= for numerical conditions so we can effectively negate those anyway

@tveasey
Copy link
Contributor

tveasey commented Mar 28, 2018

Yes, it is definitely less useful. There are still some cases where it will make rules involving metric conditions slightly more concise because of the implied logical connectives we use in conditions, but if it is a much larger change I'm happy to keep as is. I sort of wondered as well whether this is really more logically an action, keep vs remove, rather than a condition when you originally suggested this change, but don't have strong feelings.

@davidkyle
Copy link
Member Author

I pushed another commit with renaming E_Categorical -> E_CategoricalMatch and using E_CategoricalComplement

Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle
Copy link
Member Author

I pushed another commit enabling backwards compatibility for parsing the rules with the old style type: categorical this also has the advantage of not breaking the x-pack builds after this is merged and before the x-pack change has passed intake. This can be reverted later

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle davidkyle merged commit 576d86c into elastic:master Apr 9, 2018
@davidkyle davidkyle deleted the whitelist branch April 9, 2018 16:51
droberts195 pushed a commit that referenced this pull request Apr 23, 2018
* Add whitelist rule condition type

* Enable BWC for parsing categorical rules
droberts195 pushed a commit that referenced this pull request Apr 23, 2018
* Add whitelist rule condition type

* Enable BWC for parsing categorical rules
@davidkyle
Copy link
Member Author

Marking as a non-issue for release notes purposes as this code was reworked significantly in #119 and was never released in this form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants