-
Notifications
You must be signed in to change notification settings - Fork 25.3k
[DOCS] Add field extraction use cases to scripting docs #71596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Add field extraction use cases to scripting docs #71596
Conversation
Pinging @elastic/es-docs (Team:Docs) |
run elasticsearch-ci/docs |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
||
[discrete] | ||
[[field-extraction-split]] | ||
==== Split values in a field by a separator (Dissect) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
[[scripting-field-extraction]] | ||
==== Field extraction | ||
The goal of this use case is simple; you have fields in your data with a bunch of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to avoid the word "use case". I think folks don't think of themselves as having a use case. They've got a thing they want to do, but they don't call it a "use case".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True -- I think we can just say, "The goal of field extraction is simple..."
|
||
There are two options at your disposal: | ||
|
||
* <<grok-basics,Grok>> uses a pattern like a regular expression that supports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a regular expression, just a kind of unexpected dialect. May be more correct to say "is a regular expression dialect that supports aliased expression reuse." - no need to explain that it sits "on top", I guess, if you say it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 on that revision, but I still think it's valuable to call out the 1-to-1 mapping of regex in grok. Something like:
Grok is a regular expression dialect that supports aliased expressions that you can reuse. Because Grok sits on top of regular expressions, any regular expressions are valid in grok as well.
aliased expressions that you can reuse. Grok sits on top of regular expressions, so | ||
any regular expressions are valid in grok as well. | ||
* <<dissect-processor,Dissect>> extracts structured fields out of a single text | ||
field within a document, but doesn't use regular expressions. Instead, dissect uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"extracts structured fields out of text using a pattern that defines delimiters"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
Dissect extracts structured fields out of text, using delimiters to define the matching pattern. Unlike grok, dissect doesn't use regular expressions.
} | ||
---- | ||
// TEST[continued] | ||
<1> This condition ensures that the script doesn't crash even if the pattern of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/doesn't crash/doesn't emit anything/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I'l make that change.
---- | ||
[2021-04-27T16:16:34.699+0000][82460][gc,heap,exit] class space used 266K, capacity 384K, committed 384K, reserved 1048576K | ||
---- | ||
// NOTCONSOLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only need to declare something "NOCONSOLE" if the paranoid "is this json?" detector fails the build if you don't add the tag. Try removing this - if the build succeeds then you don't need it. I don't think you need it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right! I removed both mentions of //NOTCONSOLE
and local checks all passed.
[source,txt] | ||
---- | ||
emit("used" + ' ' + gc.usize + ', ' + "capacity" + ' ' + gc.csize + ', ' + "committed" + ' ' + gc.comsize) | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good thing for debugging but not super useful in production. In production you'll want to just emit(gc.usize)
or something, right? Worth pointing out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not useful in production? Is it too slow or just not something that people would typically want to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, its somewhat more slow than returning just the number you need, but I think the bigger thing is that folks will typically want to have range queries for the numbers or do math with them or group them with aggs or something - basically if you get numbers I think typically you want to extract them as long
or double
. But what you've got it useful to look at, especially because we don't yet have a way to emit all of the extracted values at once. When we have that I think it'd be easier to have folks do that and fetch them all in the fields
, even when they are just looking at things.
the value from `gc.usize` and a comma. This pattern repeats for the other data that you | ||
want to retrieve: | ||
|
||
[source,txt] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is [source,painless]
. I don't know that it makes a difference, but it is painless code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye -- I'll change that to [source,painless]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [DOCS] Add field extraction use cases to scripting docs * Adding file * Remove extra space * Add dissect pattern to split and retrieve data * Fix list spacing * Incorporating review feedback
* [DOCS] Add field extraction use cases to scripting docs * Adding file * Remove extra space * Add dissect pattern to split and retrieve data * Fix list spacing * Incorporating review feedback
… (#72648) * [DOCS] Add field extraction use cases to scripting docs (#71596) * [DOCS] Add field extraction use cases to scripting docs * Adding file * Remove extra space * Add dissect pattern to split and retrieve data * Fix list spacing * Incorporating review feedback * Adding type to console results
… (#72647) * [DOCS] Add field extraction use cases to scripting docs (#71596) * [DOCS] Add field extraction use cases to scripting docs * Adding file * Remove extra space * Add dissect pattern to split and retrieve data * Fix list spacing * Incorporating review feedback * Adding type to console results
#72646) * [DOCS] Add field extraction use cases to scripting docs (#71596) * [DOCS] Add field extraction use cases to scripting docs * Adding file * Remove extra space * Add dissect pattern to split and retrieve data * Fix list spacing * Incorporating review feedback * Adding type to console results
This PR adds a new page for common scripting use cases as part of a larger effort in #71576. This PR adds field extraction use cases for writing scripts that:
Preview link: https://elasticsearch_71596.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/common-script-uses.html
Relates to #71576