Rename methods in o.e.x.c.security.support.Automatons #114594

cbuescher · 2024-10-11T11:12:03Z

Lucene 10 stopped relying in on automaton minimization and moved the underlying Hopcroft algorithm to test code (for reasoning see apache/lucene#528). With the upgrade to Lucene 10 we currently also only determinize automata. The security Automatons utility class currently contains several methods that sound like they would minimize the automaton, but this has changed so this PR also changes the method names accordingly.

elasticsearchmachine · 2024-10-11T11:12:29Z

Pinging @elastic/es-security (Team:Security)

cbuescher · 2024-10-11T11:20:58Z

While this change in itself does only rename methods, I want to use it to get a second look at the underlying change, that is we currently can non longer use the automaton minimization algorithm provided in Lucene.
I'm not sure what this means in terms of space efficiency for the use of automata for pattern and role-matching that this utility class seems to do. The original lucene issues argues that minimization is no longer necessary that much because of:

minimization vs. determinization is less important because finishState() already performs a lot of practical sorting/coalescing on-the-fly
improvements via the UTF32-to-UTF8 automata convertor makes the worst-case-space-per-state significantly lower than with UTF-16 representation

With that in mind it's possible that with our own use of automata here we don't need to worry about the removal of minimization, but I'd like the folks at Team:Security to be aware of this and maybe take another look at how complex / big average automata construction gets here.

If it turns out we rely heavily on minimization, we will need to re-introduce it in ES by e.g. lifting the Hopcroft algorithm from Lucenes test code and maintain it ourselves.

Just fyi: @javanna @ChrisHegarty

jakelandis

LGTM

Unfortunately we don't have the benchmarks in place to determine what kind of impact this could make. The security code base makes heavy use of these structures and they generally fall into one of two categories.

The first category are automatons that get built at start up time and used at runtime to check permissions based on RBAC is the primary workflow. We have some macro level dashboards to help us asses, if at scale this introduces additional latency to the authentication workflows. We can monitor those dashboards to help ensure we didn't add significant latency.

The second workflow are ones that are computed at runtime and used at runtime for application privileges. We already know of cases where this is a performance issue, generally on the building of these structures. I think this change might actually help in those cases since, AFAIK this will result in less work during the building stage, which might improve the overall performance. We have a task to address this specific issue in the near future and that work might shed some additional light.

So, I think there are a lot of trade off's here that we are not prepared to fully understand how they could impact Elasticsearch w.r.t. performance. However, given Lucene's comments, with some macro level dsahboards and a fall back plan (owning the minimize) as well as some work upcoming work to better optimize, I feel comfortable making this change now.

cbuescher · 2024-10-16T19:57:45Z

Thanks @jakelandis. Just to make sure one clarifying note:

We can monitor those dashboards to help ensure we didn't add significant latency.

As you mentioned later in your comment, the nature of this change (omitting minimization) will most likely not add additional latency but rather remove complexity when automata are built. The part we're not 100% sure is if the current minimization reduces memory footprint of the resulting data structure a lot. From the reasoning in the Lucene issue I would say no, but as you said, without a clear benchmarking case at hand this is hard to tell. Only stating this here because I think memory is what we should keep a closer look at once Lucene 10 is merged to main.
I will rebase this PR to "main" for the method naming changes once the above PR is merged, which should happen shortly.

jakelandis · 2024-10-17T20:07:43Z

stating this here because I think memory is what we should keep a closer look

Thanks, the memory usage is a bit easier to check.

The first category are automatons that get built at start up time

Before this PR:

(Automatons is our class that references and caches the Lucene Automatons)

After this PR:

So, ~300KB increase in size to represent all of the built in privileges.

The second workflow are ones that are computed at runtime and used at runtime for application privileges

Before this PR :

After this PR:

This is a bit harder to estimate since the size needed is wildly variable. However, a single simple check [1] gives some hints to memory usage. Comparing the the impact of this workflow before vs. after, there does not seem to be an appreciable impact.

Overall, there is an increase in memory, but I don't think it will be problematic.

[1] application priv check used (not a very good representative of size in practice since the backing role does not have any application privs, but hints towards memory usage)

{{es}}/_security/user/_has_privileges
{
    "cluster": [
        "monitor",
        "manage"
    ],
    "application": [
    {
      "application": "inventory_manager",
      "privileges" : [ "read", "data:write/inventory" ],
      "resources" : [ "product/1852563" ]
    }
  ]
}

jakelandis · 2024-10-17T20:09:32Z

Note - when testing this locally, I needed update the dependencies SHA's (./gradlew --write-verification-metadata sha256) before I could build it locally (https://gradle-enterprise.elastic.co/s/jk6akcmpvmvwk).

cbuescher added >non-issue >upgrade :Security/Security Security issues without another label v9.0.0 labels Oct 11, 2024

cbuescher requested a review from a team as a code owner October 11, 2024 11:12

elasticsearchmachine added the Team:Security Meta label for security team label Oct 11, 2024

cbuescher added 2 commits October 11, 2024 13:24

Checkstyle

8180265

Merge branch 'lucene_snapshot' into rename-security-automatons-methods

aee7c9d

jakelandis approved these changes Oct 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename methods in o.e.x.c.security.support.Automatons #114594

Rename methods in o.e.x.c.security.support.Automatons #114594

cbuescher commented Oct 11, 2024

elasticsearchmachine commented Oct 11, 2024

cbuescher commented Oct 11, 2024 •

edited

Loading

jakelandis left a comment

cbuescher commented Oct 16, 2024 •

edited

Loading

jakelandis commented Oct 17, 2024

jakelandis commented Oct 17, 2024

Rename methods in o.e.x.c.security.support.Automatons #114594

Are you sure you want to change the base?

Rename methods in o.e.x.c.security.support.Automatons #114594

Conversation

cbuescher commented Oct 11, 2024

elasticsearchmachine commented Oct 11, 2024

cbuescher commented Oct 11, 2024 • edited Loading

jakelandis left a comment

Choose a reason for hiding this comment

cbuescher commented Oct 16, 2024 • edited Loading

jakelandis commented Oct 17, 2024

The first category are automatons that get built at start up time

Before this PR:

After this PR:

The second workflow are ones that are computed at runtime and used at runtime for application privileges

Before this PR :

After this PR:

jakelandis commented Oct 17, 2024

cbuescher commented Oct 11, 2024 •

edited

Loading

cbuescher commented Oct 16, 2024 •

edited

Loading