Description
Elasticsearch is by default case sensitive. EQL on the other hand strives to be case insensitive since matching strings against different OSs is not straight-forward (some are insensitive, some aren't).
Hence why string equality / non-equality are by default case-insensitive.
The current approach requires usage of scripting for functions that are case sensitive and is not fully supported for equality/non-equality. We could expand this to the rest of the operators (like >
, >=
, etc..) but considering this is a rare occurrence for strings, for the time being the scope is on ==
and !=
.
Using operators
Extending the ==
operator to be case aware is convenient but also quite impactful. That's because in all languages ==
is an exact equality, John
== john
is false.
That is everything is case sensitive and insensitivity needs to be added on top.
Either default (sensitive or insensitive) has pros and cons and having a flag that can change the behavior is the ideal way. Currently there is a default through the case_sensitive
parameter which can be kept though it would have to be renamed since it's only the equality that we're after so
case_sensitive
--> case_sensitive_equality
.
The issue with this type of parameter is that all string comparisons have the same sensitivity. Potentially we can introduce dedicated operator such as ~=
or ~==
to indicate a case insensitive comparison.
The pro of this approach is that there are defined scopes, the downside is that it might be too subtle for folks to pick it up.
Wrapping functions
Another option would be to use some kind of function say insensitive(foo == bar)
or sensitive(foo == bar)
which is a more verbose way of supporting ==
and ~=
and offering both sensitivities regardless of the global setting.
Impact on functions
As described in #61162, case insensitivity will be an option on a limited number of queries. Currently this translates to:
- string equality (
term
andterms
) startsWith
(prefix
query)- pattern matching,
match
,wildcard
(wildcard
query)
It's worth revising the semantics of insensitivity over all the functions in particular:
between
,indexOf
endsWith
- might be rewritten to a wildcard querystringContains
- wildcard again
As last note, a global setting will affect both the operator and the functions. Meaning if we need scoping - have functions with both types of sensitivity as well as operators, we need to introduce either dedicated switches/wrapping functions.
My proposal is to look at the case insensitive usage of functions in existing queries and where needed, try to retrofit them onto the existing queries. While we might not cover all possible options, the vast majority of cases / rules might be covered.