Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Data Prepper Expression Syntax #1005

Closed
sbayer55 opened this issue Feb 8, 2022 · 10 comments
Closed

[RFC] Data Prepper Expression Syntax #1005

sbayer55 opened this issue Feb 8, 2022 · 10 comments
Assignees
Labels
proposal Proposed major changes to Data Prepper
Milestone

Comments

@sbayer55
Copy link
Member

sbayer55 commented Feb 8, 2022

Is your feature request related to a problem? Please describe.
As part of a larger feature (#522) to support complex condition statements in data prepper there is a need to define a syntax for conditional statements. A conditional statement is a String that is evaluated at runtime and may reference fields within a record.

Describe the solution you'd like
Terms used throught this document are defined in the Definitions section

Supported Operators

In order of evaluation priority. (top to bottom, left to right)

Operator Description Data Prepper Version
{} Set Initializer 1.4.0
() Priority Expression 1.3.0
not Not Operator 1.3.0
in, not in Set Operators 1.4.0
<, <=, >, >= Relational Operators 1.3.0
=~, !~ Regex Equality Operators TBD
==, != Equality Operators 1.3.0
and, or Conditional Expression 1.3.0
, Set Value Delimiter 1.4.0

Reserved for possible future functionality

Reserved symbol set: ^, *, /, %, +, -, xor, =, +=, -=, *=, /=, %=, ++, --, ${<text>}

Set Initialiser

Defines a set or term and/or expressions.

Examples

{1, 2, 3}
{"a", "b", "c"}
{/people/0/name, /status_code}

Priority Expression

Identifies an expression that will be evaluated at the highest priority level. Priority expression must contain an
expression or value, empty parentheses are not supported.

Examples

/is_cool == (/name == "Steven")

Set Operators

Tests if a value is in/not in a set. Note, the right-hand side operand must be a set.

Syntax

<Expression> in <Set>
<Expression> not in <Set>

Examples

/status_code in {200, 202}
/status_code not in {400, 404, 500}

Relational Operators

Tests the relationship of two numeric values. Note, the operands must be a number or Json Pointer that will resolve to a number.

Syntax

<Number | Json Pointer> < <Number | Json Pointer>
<Number | Json Pointer> <= <Number | Json Pointer>
<Number | Json Pointer> > <Number | Json Pointer>
<Number | Json Pointer> >= <Number | Json Pointer>

Examples

/status_code >= 200 and /status_code < 300

Regex Equality Operators

Used to test if a String value matches/does not match a Regular Expression. Note, the left-hand side operand must be a string or Json Pointer that resolves to a String. The right hand side operand must be a String that contains a regular expression or a Json Pointer that resolves to a String that contains a regular expression.

Syntax

<String | Json Pointer> =~ <Regex String | Json Pointer>
<String | Json Pointer> !~ <Regex String | Json Pointer>

Examples

/string_property =~ "^[A-Za-z\s]*$"
"Hello!" !~ /event/regex_matcher

Equality Operators

Used to test if two value are/are not equivalent.

Syntax

<Any> == <Any>
<Any> != <Any>

Examples

/is_cool == true
3.14 != /status_code
{1, 2} == /event/set_property

Conditional Expression

Used to chain together multiple expressions and/or values.

Syntax

<Any> and <Any>
<Any> or <Any>
not <Any>

Examples

/status_code == 200 and /message == "Hello world"
/status_code == 200 or /status_code == 202
not /status_code in {200, 202}

Definitions

Literal

A fundamental value that has no children.

  • Float (Supports values from 3.40282347 x 10^38 to 1.40239846 x 10^-45)
  • Integer (Supports values from -2147483648 to 2147483647)
  • Boolean (Supports true or false)
  • Json Pointer (See Json Pointer section for details)
  • String (Supports Valid Java String characters)

Expression String

The String that will be parsed for evaluation. Expression String is the highest level of a Data Prepper Expression. Only supports one
Expression String resulting in a return value. Note, an Expression String is not the same as an Expression.

Statement

The highest level component of the Expression String.

Expression

A generic component that contains a Primary or an Operator. Expressions may contain expressions. An expressions imminent children can contains 0-1 Operators.

Primary

  • Set
  • Priority Expression
  • Literal

Operator

Hard coded token that identifies the operation use in an Expression.

Json Pointer

A Literal used to reference a value within the Event provided as context for the Expression String. Json Pointers are identified by a leading / containing alpha numeric character or underscores, delimited by /. Json Pointers can use an extended character set if wrapped in double quotes (") using the escape character \. Note, Json Pointer require ~ and / that should be used as part of the path and not a delimiter to be escaped.

  • ~0 representing ~
  • ~1 representing /

Shorthand Syntax (Regex, \w = [A-Za-z_])

/\w+(/\w+)*

Shorthand Example

/Hello/World/0

Escaped Syntax

"/<Valid String Characters | Escaped Character>(/<Valid String Characters | Escaped Character>)*"

Escaped Example

# Path
# { "Hello - 'world/" : [{ "\"JsonPointer\"": true }] }
"/Hello - 'world\//0/\"JsonPointer\""

White Space

Operators

White space is optional surrounding Relational Operators, Regex Equality Operators, Equality Operators and commas.
White space is required surrounding Set Initializers, Priority Expressions, Set Operators, and Conditional Expressions.

Reference Table

Operator Description White Space Required ✅ Valid Examples ❌ Invalid Examples
{} Set Initializer Yes /status in {200} /status in{200}
() Priority Expression Yes /a==(/b==200)
/a in ({200})
/status in({200})
in, not in Set Operators Yes /a in {200}
/a not in {400}
/a in{200, 202}
/a not in{400}
<, <=, >, >= Relational Operators No /status < 300
/status>=300
=~, !~ Regex Equality Operators No /msg =~ "^\w*$"
/msg=~"^\w*$"
==, != Equality Operators No /status == 200
/status_code==200
and, or, not Conditional Operators Yes /a<300 and /b>200 /b<300and/b>200
, Set Value Delimiter No /a in {200, 202}
/a in {200,202}
/a in {200 , 202}
/a in {200,}
@sbayer55 sbayer55 added documentation Improvements or additions to documentation untriaged labels Feb 8, 2022
@sbayer55 sbayer55 added this to the v1.3 milestone Feb 8, 2022
@sbayer55 sbayer55 self-assigned this Feb 8, 2022
@dlvenable dlvenable added proposal Proposed major changes to Data Prepper and removed documentation Improvements or additions to documentation labels Feb 8, 2022
@dlvenable
Copy link
Member

Thanks for putting this together! It looks great.

One change I'd like to make is to use the term "set" instead of "list." For example, the section "List Operators" should be "Set Operators." I think that we are viewing them as sets, so it would be better to be explicit in that.

Along those lines, I also wonder if it would make sense to change the syntax for set initialization. Perhaps a set could be initialized with curly braces like {"a", "b", "c"}. This is a more common convention for sets whereas brackets are lists.

I'm not sure if Data Prepper will ever need to evaluate against true lists or not. Perhaps /myList == ["a", "b"] will matter to somebody. Using curly braces will help keep these concepts distinct.

There may be some value in supporting variables in the future too. I'd probably say Data Prepper would use the standard ${} syntax. I don't think this would be a problem with {} sets, but do want to point it out.

@sbayer55
Copy link
Member Author

Great feedback!

I'd like to add that variable names should be supported within string "Hello ${name}". To support embedded variables $ will need to be escaped within Strings.

This was referenced Feb 10, 2022
@cmanning09
Copy link
Contributor

cmanning09 commented Feb 18, 2022

5 + 5
true == true

How would customer use these examples?

I am envisioning customers only writing expressions that always include a jsonPointer

@cmanning09
Copy link
Contributor

cmanning09 commented Feb 18, 2022

Do we need assignment operators? How will they be used?

Other high level questions - what are our requirements? What is the scope of this syntax?

@dlvenable
Copy link
Member

dlvenable commented Feb 18, 2022

While this is a good plan, we should also consider the prioritization of which operations will be most valuable. Then we can iterate on the implementation of them.

I'd propose the prioritizing them by:

  1. Boolean expressions
  2. Arithmetic operations
  3. Assignment (do we need this?)

Here is my initial priority:

Data Prepper 1.3.0:

  • Equality Operators ==, !=
  • Relational Operators <, <=, >, >=
  • Conditional Expression and, or, not
  • Priority Expression ()

Data Prepper 1.4.0:

  • Set Initialiser {}
  • Set Operators in, not in

Data Prepper 2.0.0:

  • Regex Equality Operators =~, !~

Needing future consideration:

  • String Concatenation Operator +
  • Arithmetic Operators (^, *, /, %, +, -)
  • Assignment Operators =, +=, -=, *=, /=, %=

@dinujoh
Copy link
Member

dinujoh commented Feb 18, 2022

  • Do we need Increment (++) and Decrement Arithmetic Operators?
  • What are min and max values for Integer literal ?
  • Do we support set concatenation ?

@sbayer55
Copy link
Member Author

Thank you everyone for the feedback. I updated the RFC proposal based on everyones feedback.

@dlvenable
Copy link
Member

Thanks @sbayer55 ,

You added set initializer to 1.3.0, but set operations to 1.4.0. Is there value in having the initializers without the operations? If not, I'd say we add set initialization in 1.4.0 as well.

Also, Data Prepper may want to support a limited set of functions in the future. For example length(/list). I'm not sure that is what we'd do or not. But, it may be valuable to disallow whitespace before parenthesis. So /a and(/b or /c) could be confused with a function. Should Data Prepper tighten restrictions regarding whitespace?

@sbayer55 sbayer55 mentioned this issue Feb 18, 2022
4 tasks
@sbayer55 sbayer55 reopened this Feb 21, 2022
@sbayer55
Copy link
Member Author

Issues regarding white space and potential future function support raised.

@kjorg50
Copy link

kjorg50 commented Mar 30, 2023

Perhaps it is too late to comment on this RFC, but I would like a little clarity on expressions for matching against string values with . characters (as this seems to be quite common for OTEL data). In the JSON Pointer definition it says

Json Pointers can use an extended character set if wrapped in double quotes (") using the escape character \. Note, Json Pointer require ~ and / that should be used as part of the path and not a delimiter to be escaped.

And I also see the "Escaped Example" above; however, when trying to write my own expression I am not able to get it to work for conditional routing with something like this (trying to escape my . characters):

  route:
    - super-tenant: '/"span\.attributes\.tenant" == "super"'
    - other-tenant: '/"span\.attributes\.tenant" != "super"'

Do you have any recommendations for this scenario? Thanks!

I also commented on #2259

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Proposed major changes to Data Prepper
Projects
Archived in project
Development

No branches or pull requests

5 participants