Skip to content

[ML] adds support for non-numeric mapped types #40220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Mar 19, 2019

This PR does a handful of things to address the issues when creating the destination index mapping and parsing the resulting aggregations before indexing them.

The destination mapping type is taken into account now when extracting the data before putting it into the destination index. Right now all that is considered if the destination is a Numeric type or not, but this should not restrict more complex type requirements in the future (if they should arise).

Future work:

  • Providing multi-type fields mapping overrides. This should be added in tandem with our support of such in mapping deduction. Not added in this PR.
  • Validations between types + Aggregations. As it stands right now, we don't really support any aggregations + types that would cause this and the indexing would simply fail on type/mapping mismatch. We should add a mechanism of alerting the user of indexing/search failure (attempt at this is: [ML] add auditor to data frame plugin #40012)
  • Add mechanisms for the end user to provide a "return type" for functions/aggregations. Certain aggregations (looking at you bucket_script) do not provide interfaces to accurately determine the resultant type. For this scenario the end user should provide some sort of mapping type for the field result.

Alternative designs:

  • Always use getValueAsString
    • Though exceedingly simple, this is a sub-optimal user experience as even numeric values get put in the _source as strings. This could lead to confusion, especially when considering our _preview endpoint.
  • Automatically determine when to use getValueAsString through Java type parsing
    • This restricts the user's input on the destination mapping too much. There may be use-cases where the desired type for a integer is actually a keyword.

closes #39974

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@benwtrent benwtrent force-pushed the feature/data-frame-support-non-numeric-types branch from a961460 to 00c70b8 Compare March 19, 2019 19:56
@benwtrent
Copy link
Member Author

run elasticsearch-ci/packaging-sample

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

e);
}), latch));
try {
latch.await(LOAD_TRANSFORM_TIMEOUT_IN_SECONDS, TimeUnit.SECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: I thought fieldMappings would have to be volatile for the assignment to be visible to the awaiting thread but latch.await is a synchronisation point. https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility.

I presume we are blocking here because super.maybeTriggerAsyncJob(now); must execute on the calling thread.

@benwtrent
Copy link
Member Author

run elasticsearch-ci/bwc
run elasticsearch-ci/default-distro

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hendrikmuhs
Copy link

run elasticsearch-ci/1
run elasticsearch-ci/packaging-sample

@benwtrent
Copy link
Member Author

run elasticsearch-ci/packaging-sample

@sophiec20
Copy link
Contributor

does this title need amending?

@benwtrent benwtrent changed the title [ML] adds support for non-numeric mapped types and mapping overrides [ML] adds support for non-numeric mapped types Mar 22, 2019
@benwtrent benwtrent merged commit a3e10a7 into elastic:master Mar 22, 2019
@benwtrent benwtrent deleted the feature/data-frame-support-non-numeric-types branch March 22, 2019 20:19
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Mar 22, 2019
* [ML] adds support for non-numeric mapped types and mapping overrides

* correcting hlrc compilation issues after merge

* removing mapping_override option

* clearing up unnecessary changes
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Mar 22, 2019
* elastic/master:
  [DOCS] Adds notable highlights tags (elastic#40330)
  [ML] making test more determinate (elastic#40374)
  [ML] adds support for non-numeric mapped types (elastic#40220)
  Move outbound message handling to OutboundHandler (elastic#40336)
  Add implicit this for class binding in Painless (elastic#40285)
  Muting test testExtractIndexCheckpointsInconsistentGlobalCheckpoints (elastic#40371)
  DOC: polish client docs
  Fix building bwc versions (elastic#40361)
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Mar 22, 2019
…-stop-time

* elastic/master:
  [DOCS] Adds notable highlights tags (elastic#40330)
  [ML] making test more determinate (elastic#40374)
  [ML] adds support for non-numeric mapped types (elastic#40220)
  Move outbound message handling to OutboundHandler (elastic#40336)
  Add implicit this for class binding in Painless (elastic#40285)
benwtrent added a commit that referenced this pull request Mar 23, 2019
* [ML] adds support for non-numeric mapped types and mapping overrides

* correcting hlrc compilation issues after merge

* removing mapping_override option

* clearing up unnecessary changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Data frame transform silently fails if using min(timestamp)
6 participants