[DOCS] Add field extraction use cases to scripting docs #71596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

lockewritesdocs merged 6 commits into elastic:master from lockewritesdocs:docs__scripts-field-extraction

May 3, 2021

Contributor

lockewritesdocs commented Apr 12, 2021 •

edited

Loading

This PR adds a new page for common scripting use cases as part of a larger effort in #71576. This PR adds field extraction use cases for writing scripts that:

Extract an IP address from a log message
Parse a string to extract part of a field
Split the values in a field by a specific separator

Preview link: https://elasticsearch_71596.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/common-script-uses.html

Relates to #71576


          [DOCS] Add field extraction use cases to scripting docs

be130d5

lockewritesdocs added >docs v8.0.0 v7.13.0 labels

lockewritesdocs self-assigned this

elasticmachine added the Team:Docs label

Collaborator

elasticmachine commented Apr 12, 2021

Pinging @elastic/es-docs (Team:Docs)

lockewritesdocs mentioned this pull request

[DOCS] Add common scripting use cases with focus on runtime fields #71576

Open

6 tasks


          Adding file

620d751

Contributor Author

lockewritesdocs commented Apr 13, 2021

run elasticsearch-ci/docs

lockewritesdocs added the :Core/Infra/Scripting label

elasticmachine added the Team:Core/Infra label

Collaborator

elasticmachine commented Apr 23, 2021

Pinging @elastic/es-core-infra (Team:Core/Infra)

Adam Locke added 3 commits

April 29, 2021 09:01


          Remove extra space

c0f43cd


          Add dissect pattern to split and retrieve data

9227c2a


          Fix list spacing

b009ebc

lockewritesdocs requested a review from nik9000

April 30, 2021 16:51

lockewritesdocs commented

View reviewed changes

docs/reference/scripting/common-script-uses.asciidoc

+              [discrete]
+              [[field-extraction-split]]
+              ==== Split values in a field by a separator (Dissect)

Contributor Author

lockewritesdocs Apr 30, 2021 •

edited

Loading

@nik9000, all of this explanatory information is fine here for now, but I plan on pulling the dissect docs out of ingest processor and into the scripting docs in #72580, and then putting this more detailed information there.

nik9000 reviewed

View reviewed changes

docs/reference/scripting/common-script-uses.asciidoc Outdated

+              [[scripting-field-extraction]]
+              ==== Field extraction
+              The goal of this use case is simple; you have fields in your data with a bunch of

Member

nik9000 May 3, 2021

I tend to avoid the word "use case". I think folks don't think of themselves as having a use case. They've got a thing they want to do, but they don't call it a "use case".

Contributor Author

lockewritesdocs May 3, 2021

True -- I think we can just say, "The goal of field extraction is simple..."

docs/reference/scripting/common-script-uses.asciidoc Outdated


		There are two options at your disposal:

		* <<grok-basics,Grok>> uses a pattern like a regular expression that supports

Member

nik9000 May 3, 2021

It is a regular expression, just a kind of unexpected dialect. May be more correct to say "is a regular expression dialect that supports aliased expression reuse." - no need to explain that it sits "on top", I guess, if you say it that way.

Contributor Author

lockewritesdocs May 3, 2021

I'm +1 on that revision, but I still think it's valuable to call out the 1-to-1 mapping of regex in grok. Something like:

Grok is a regular expression dialect that supports aliased expressions that you can reuse. Because Grok sits on top of regular expressions, any regular expressions are valid in grok as well.

docs/reference/scripting/common-script-uses.asciidoc Outdated

+              aliased expressions that you can reuse. Grok sits on top of regular expressions, so
+              any regular expressions are valid in grok as well.
+              * <<dissect-processor,Dissect>> extracts structured fields out of a single text
+              field within a document, but doesn't use regular expressions. Instead, dissect uses

Member

nik9000 May 3, 2021

"extracts structured fields out of text using a pattern that defines delimiters"?

Contributor Author

lockewritesdocs May 3, 2021 •

edited

Loading

How about:

Dissect extracts structured fields out of text, using delimiters to define the matching pattern. Unlike grok, dissect doesn't use regular expressions.

docs/reference/scripting/common-script-uses.asciidoc Outdated

+              }
+              ----
+              // TEST[continued]
+              <1> This condition ensures that the script doesn't crash even if the pattern of

Member

nik9000 May 3, 2021

s/doesn't crash/doesn't emit anything/?

Contributor Author

lockewritesdocs May 3, 2021

++ I'l make that change.

docs/reference/scripting/common-script-uses.asciidoc Outdated

+              ----
+              [2021-04-27T16:16:34.699+0000][82460][gc,heap,exit]   class space    used 266K, capacity 384K, committed 384K, reserved 1048576K
+              ----
+              // NOTCONSOLE

Member

nik9000 May 3, 2021

You only need to declare something "NOCONSOLE" if the paranoid "is this json?" detector fails the build if you don't add the tag. Try removing this - if the build succeeds then you don't need it. I don't think you need it here.

Contributor Author

lockewritesdocs May 3, 2021

You're right! I removed both mentions of //NOTCONSOLE and local checks all passed.

docs/reference/scripting/common-script-uses.asciidoc

+              [source,txt]
+              ----
+              emit("used" + ' ' + gc.usize + ', ' + "capacity" + ' ' + gc.csize + ', ' + "committed" + ' ' + gc.comsize)
+              ----

Member

nik9000 May 3, 2021

This is a good thing for debugging but not super useful in production. In production you'll want to just emit(gc.usize) or something, right? Worth pointing out.

Contributor Author

lockewritesdocs May 3, 2021

Why is this not useful in production? Is it too slow or just not something that people would typically want to do?

Member

nik9000 May 3, 2021

Yeah, its somewhat more slow than returning just the number you need, but I think the bigger thing is that folks will typically want to have range queries for the numbers or do math with them or group them with aggs or something - basically if you get numbers I think typically you want to extract them as long or double. But what you've got it useful to look at, especially because we don't yet have a way to emit all of the extracted values at once. When we have that I think it'd be easier to have folks do that and fetch them all in the fields, even when they are just looking at things.

docs/reference/scripting/common-script-uses.asciidoc Outdated

+              the value from `gc.usize` and a comma. This pattern repeats for the other data that you
+              want to retrieve:
+              [source,txt]

Member

nik9000 May 3, 2021

This is [source,painless]. I don't know that it makes a difference, but it is painless code.

Contributor Author

lockewritesdocs May 3, 2021

Good eye -- I'll change that to [source,painless].


          Incorporating review feedback

f44bf34

nik9000 approved these changes

View reviewed changes

Member

nik9000 left a comment

LGTM

lockewritesdocs merged commit 44a1973 into elastic:master

lockewritesdocs deleted the docs__scripts-field-extraction branch

May 3, 2021 20:24

lockewritesdocs mentioned this pull request

[DOCS] [7.x] Add field extraction use cases to scripting docs (#71596) #72646

Merged

lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request


          [DOCS] Add field extraction use cases to scripting docs (elastic#71596)

99af378

* [DOCS] Add field extraction use cases to scripting docs

* Adding file

* Remove extra space

* Add dissect pattern to split and retrieve data

* Fix list spacing

* Incorporating review feedback

lockewritesdocs mentioned this pull request

[DOCS] [7.13] Add field extraction use cases to scripting docs (#71596) #72647

Merged

lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request


          [DOCS] Add field extraction use cases to scripting docs (elastic#71596)

77578cf

* [DOCS] Add field extraction use cases to scripting docs

* Adding file

* Remove extra space

* Add dissect pattern to split and retrieve data

* Fix list spacing

* Incorporating review feedback

lockewritesdocs mentioned this pull request

[DOCS] [7.12] Add field extraction use cases to scripting docs (#71596) #72648

Merged

lockewritesdocs pushed a commit that referenced this pull request


          [DOCS] [7.12] Add field extraction use cases to scripting docs (#71596)…

c8c2b42

… (#72648)

* [DOCS] Add field extraction use cases to scripting docs (#71596)

* [DOCS] Add field extraction use cases to scripting docs

* Adding file

* Remove extra space

* Add dissect pattern to split and retrieve data

* Fix list spacing

* Incorporating review feedback

* Adding type to console results

lockewritesdocs pushed a commit that referenced this pull request


          [DOCS] [7.13] Add field extraction use cases to scripting docs (#71596)…

5b1e31f

… (#72647)

* [DOCS] Add field extraction use cases to scripting docs (#71596)

* [DOCS] Add field extraction use cases to scripting docs

* Adding file

* Remove extra space

* Add dissect pattern to split and retrieve data

* Fix list spacing

* Incorporating review feedback

* Adding type to console results

lockewritesdocs pushed a commit that referenced this pull request


          [DOCS] [7.x] Add field extraction use cases to scripting docs (#71596) (

1f5c128

#72646)

* [DOCS] Add field extraction use cases to scripting docs (#71596)

* [DOCS] Add field extraction use cases to scripting docs

* Adding file

* Remove extra space

* Add dissect pattern to split and retrieve data

* Fix list spacing

* Incorporating review feedback

* Adding type to console results

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Scripting >docs Team:Core/Infra Team:Docs v7.13.0 v8.0.0-alpha1