Skip to content

EQL: Revisit case insensitivity #61883

Closed
Closed
@costin

Description

@costin

Elasticsearch is by default case sensitive. EQL on the other hand strives to be case insensitive since matching strings against different OSs is not straight-forward (some are insensitive, some aren't).
Hence why string equality / non-equality are by default case-insensitive.

The current approach requires usage of scripting for functions that are case sensitive and is not fully supported for equality/non-equality. We could expand this to the rest of the operators (like >, >=, etc..) but considering this is a rare occurrence for strings, for the time being the scope is on == and !=.

Using operators

Extending the == operator to be case aware is convenient but also quite impactful. That's because in all languages == is an exact equality, John == john is false.
That is everything is case sensitive and insensitivity needs to be added on top.

Either default (sensitive or insensitive) has pros and cons and having a flag that can change the behavior is the ideal way. Currently there is a default through the case_sensitive parameter which can be kept though it would have to be renamed since it's only the equality that we're after so
case_sensitive --> case_sensitive_equality.

The issue with this type of parameter is that all string comparisons have the same sensitivity. Potentially we can introduce dedicated operator such as ~= or ~== to indicate a case insensitive comparison.

The pro of this approach is that there are defined scopes, the downside is that it might be too subtle for folks to pick it up.

Wrapping functions

Another option would be to use some kind of function say insensitive(foo == bar) or sensitive(foo == bar) which is a more verbose way of supporting == and ~= and offering both sensitivities regardless of the global setting.

Impact on functions

As described in #61162, case insensitivity will be an option on a limited number of queries. Currently this translates to:

  • string equality (term and terms)
  • startsWith (prefix query)
  • pattern matching, match, wildcard (wildcard query)

It's worth revising the semantics of insensitivity over all the functions in particular:

  • between, indexOf
  • endsWith- might be rewritten to a wildcard query
  • stringContains - wildcard again

As last note, a global setting will affect both the operator and the functions. Meaning if we need scoping - have functions with both types of sensitivity as well as operators, we need to introduce either dedicated switches/wrapping functions.

My proposal is to look at the case insensitive usage of functions in existing queries and where needed, try to retrofit them onto the existing queries. While we might not cover all possible options, the vast majority of cases / rules might be covered.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions