Skip to content

[DOCS] Add field extraction use cases to scripting docs #71596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lockewritesdocs
Copy link
Contributor

@lockewritesdocs lockewritesdocs commented Apr 12, 2021

This PR adds a new page for common scripting use cases as part of a larger effort in #71576. This PR adds field extraction use cases for writing scripts that:

  • Extract an IP address from a log message
  • Parse a string to extract part of a field
  • Split the values in a field by a specific separator

Preview link: https://elasticsearch_71596.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/common-script-uses.html

Relates to #71576

@lockewritesdocs lockewritesdocs added >docs General docs changes v8.0.0 v7.13.0 labels Apr 12, 2021
@lockewritesdocs lockewritesdocs self-assigned this Apr 12, 2021
@elasticmachine elasticmachine added the Team:Docs Meta label for docs team label Apr 12, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@lockewritesdocs
Copy link
Contributor Author

run elasticsearch-ci/docs

@lockewritesdocs lockewritesdocs added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Apr 23, 2021
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Apr 23, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@lockewritesdocs lockewritesdocs requested a review from nik9000 April 30, 2021 16:51

[discrete]
[[field-extraction-split]]
==== Split values in a field by a separator (Dissect)
Copy link
Contributor Author

@lockewritesdocs lockewritesdocs Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000, all of this explanatory information is fine here for now, but I plan on pulling the dissect docs out of ingest processor and into the scripting docs in #72580, and then putting this more detailed information there.


[[scripting-field-extraction]]
==== Field extraction
The goal of this use case is simple; you have fields in your data with a bunch of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to avoid the word "use case". I think folks don't think of themselves as having a use case. They've got a thing they want to do, but they don't call it a "use case".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True -- I think we can just say, "The goal of field extraction is simple..."


There are two options at your disposal:

* <<grok-basics,Grok>> uses a pattern like a regular expression that supports
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a regular expression, just a kind of unexpected dialect. May be more correct to say "is a regular expression dialect that supports aliased expression reuse." - no need to explain that it sits "on top", I guess, if you say it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on that revision, but I still think it's valuable to call out the 1-to-1 mapping of regex in grok. Something like:

Grok is a regular expression dialect that supports aliased expressions that you can reuse. Because Grok sits on top of regular expressions, any regular expressions are valid in grok as well.

aliased expressions that you can reuse. Grok sits on top of regular expressions, so
any regular expressions are valid in grok as well.
* <<dissect-processor,Dissect>> extracts structured fields out of a single text
field within a document, but doesn't use regular expressions. Instead, dissect uses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"extracts structured fields out of text using a pattern that defines delimiters"?

Copy link
Contributor Author

@lockewritesdocs lockewritesdocs May 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Dissect extracts structured fields out of text, using delimiters to define the matching pattern. Unlike grok, dissect doesn't use regular expressions.

}
----
// TEST[continued]
<1> This condition ensures that the script doesn't crash even if the pattern of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/doesn't crash/doesn't emit anything/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I'l make that change.

----
[2021-04-27T16:16:34.699+0000][82460][gc,heap,exit] class space used 266K, capacity 384K, committed 384K, reserved 1048576K
----
// NOTCONSOLE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need to declare something "NOCONSOLE" if the paranoid "is this json?" detector fails the build if you don't add the tag. Try removing this - if the build succeeds then you don't need it. I don't think you need it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! I removed both mentions of //NOTCONSOLE and local checks all passed.

[source,txt]
----
emit("used" + ' ' + gc.usize + ', ' + "capacity" + ' ' + gc.csize + ', ' + "committed" + ' ' + gc.comsize)
----
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good thing for debugging but not super useful in production. In production you'll want to just emit(gc.usize) or something, right? Worth pointing out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not useful in production? Is it too slow or just not something that people would typically want to do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, its somewhat more slow than returning just the number you need, but I think the bigger thing is that folks will typically want to have range queries for the numbers or do math with them or group them with aggs or something - basically if you get numbers I think typically you want to extract them as long or double. But what you've got it useful to look at, especially because we don't yet have a way to emit all of the extracted values at once. When we have that I think it'd be easier to have folks do that and fetch them all in the fields, even when they are just looking at things.

the value from `gc.usize` and a comma. This pattern repeats for the other data that you
want to retrieve:

[source,txt]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is [source,painless]. I don't know that it makes a difference, but it is painless code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye -- I'll change that to [source,painless].

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lockewritesdocs lockewritesdocs merged commit 44a1973 into elastic:master May 3, 2021
@lockewritesdocs lockewritesdocs deleted the docs__scripts-field-extraction branch May 3, 2021 20:24
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request May 3, 2021
* [DOCS] Add field extraction use cases to scripting docs

* Adding file

* Remove extra space

* Add dissect pattern to split and retrieve data

* Fix list spacing

* Incorporating review feedback
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request May 3, 2021
* [DOCS] Add field extraction use cases to scripting docs

* Adding file

* Remove extra space

* Add dissect pattern to split and retrieve data

* Fix list spacing

* Incorporating review feedback
lockewritesdocs pushed a commit that referenced this pull request May 3, 2021
… (#72648)

* [DOCS] Add field extraction use cases to scripting docs (#71596)

* [DOCS] Add field extraction use cases to scripting docs

* Adding file

* Remove extra space

* Add dissect pattern to split and retrieve data

* Fix list spacing

* Incorporating review feedback

* Adding type to console results
lockewritesdocs pushed a commit that referenced this pull request May 3, 2021
… (#72647)

* [DOCS] Add field extraction use cases to scripting docs (#71596)

* [DOCS] Add field extraction use cases to scripting docs

* Adding file

* Remove extra space

* Add dissect pattern to split and retrieve data

* Fix list spacing

* Incorporating review feedback

* Adding type to console results
lockewritesdocs pushed a commit that referenced this pull request May 3, 2021
#72646)

* [DOCS] Add field extraction use cases to scripting docs (#71596)

* [DOCS] Add field extraction use cases to scripting docs

* Adding file

* Remove extra space

* Add dissect pattern to split and retrieve data

* Fix list spacing

* Incorporating review feedback

* Adding type to console results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >docs General docs changes Team:Core/Infra Meta label for core/infra team Team:Docs Meta label for docs team v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants