Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename methods in o.e.x.c.security.support.Automatons #114594

Open
wants to merge 3 commits into
base: lucene_snapshot
Choose a base branch
from

Conversation

cbuescher
Copy link
Member

Lucene 10 stopped relying in on automaton minimization and moved the underlying Hopcroft algorithm to test code (for reasoning see apache/lucene#528). With the upgrade to Lucene 10 we currently also only determinize automata. The security Automatons utility class currently contains several methods that sound like they would minimize the automaton, but this has changed so this PR also changes the method names accordingly.

Lucene 10 stopped relying in on automaton minimization and moved the underlying
Hopcroft algorithm to test code (for reasoning see
apache/lucene#528). With the upgrade to Lucene 10 we
currently also only determinize automata. The security Automatons utility class
currently contains several methods that sound like they would minimize the
automaton, but this has changed so this PR also changes the method names
accordingly.
@cbuescher cbuescher added >non-issue >upgrade :Security/Security Security issues without another label v9.0.0 labels Oct 11, 2024
@cbuescher cbuescher requested a review from a team as a code owner October 11, 2024 11:12
@elasticsearchmachine elasticsearchmachine added the Team:Security Meta label for security team label Oct 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@cbuescher
Copy link
Member Author

cbuescher commented Oct 11, 2024

While this change in itself does only rename methods, I want to use it to get a second look at the underlying change, that is we currently can non longer use the automaton minimization algorithm provided in Lucene.
I'm not sure what this means in terms of space efficiency for the use of automata for pattern and role-matching that this utility class seems to do. The original lucene issues argues that minimization is no longer necessary that much because of:

  • minimization vs. determinization is less important because finishState() already performs a lot of practical sorting/coalescing on-the-fly
  • improvements via the UTF32-to-UTF8 automata convertor makes the worst-case-space-per-state significantly lower than with UTF-16 representation

With that in mind it's possible that with our own use of automata here we don't need to worry about the removal of minimization, but I'd like the folks at Team:Security to be aware of this and maybe take another look at how complex / big average automata construction gets here.

If it turns out we rely heavily on minimization, we will need to re-introduce it in ES by e.g. lifting the Hopcroft algorithm from Lucenes test code and maintain it ourselves.

Just fyi: @javanna @ChrisHegarty

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Unfortunately we don't have the benchmarks in place to determine what kind of impact this could make. The security code base makes heavy use of these structures and they generally fall into one of two categories.

The first category are automatons that get built at start up time and used at runtime to check permissions based on RBAC is the primary workflow. We have some macro level dashboards to help us asses, if at scale this introduces additional latency to the authentication workflows. We can monitor those dashboards to help ensure we didn't add significant latency.

The second workflow are ones that are computed at runtime and used at runtime for application privileges. We already know of cases where this is a performance issue, generally on the building of these structures. I think this change might actually help in those cases since, AFAIK this will result in less work during the building stage, which might improve the overall performance. We have a task to address this specific issue in the near future and that work might shed some additional light.

So, I think there are a lot of trade off's here that we are not prepared to fully understand how they could impact Elasticsearch w.r.t. performance. However, given Lucene's comments, with some macro level dsahboards and a fall back plan (owning the minimize) as well as some work upcoming work to better optimize, I feel comfortable making this change now.

@cbuescher
Copy link
Member Author

cbuescher commented Oct 16, 2024

Thanks @jakelandis. Just to make sure one clarifying note:

We can monitor those dashboards to help ensure we didn't add significant latency.

As you mentioned later in your comment, the nature of this change (omitting minimization) will most likely not add additional latency but rather remove complexity when automata are built. The part we're not 100% sure is if the current minimization reduces memory footprint of the resulting data structure a lot. From the reasoning in the Lucene issue I would say no, but as you said, without a clear benchmarking case at hand this is hard to tell. Only stating this here because I think memory is what we should keep a closer look at once Lucene 10 is merged to main.
I will rebase this PR to "main" for the method naming changes once the above PR is merged, which should happen shortly.

@jakelandis
Copy link
Contributor

stating this here because I think memory is what we should keep a closer look

Thanks, the memory usage is a bit easier to check.

The first category are automatons that get built at start up time

Before this PR:

(Automatons is our class that references and caches the Lucene Automatons)
image

image

After this PR:

image

So, ~300KB increase in size to represent all of the built in privileges.

The second workflow are ones that are computed at runtime and used at runtime for application privileges

Before this PR :

image image

After this PR:

image

This is a bit harder to estimate since the size needed is wildly variable. However, a single simple check [1] gives some hints to memory usage. Comparing the the impact of this workflow before vs. after, there does not seem to be an appreciable impact.

Overall, there is an increase in memory, but I don't think it will be problematic.

[1] application priv check used (not a very good representative of size in practice since the backing role does not have any application privs, but hints towards memory usage)

{{es}}/_security/user/_has_privileges
{
    "cluster": [
        "monitor",
        "manage"
    ],
    "application": [
    {
      "application": "inventory_manager",
      "privileges" : [ "read", "data:write/inventory" ],
      "resources" : [ "product/1852563" ]
    }
  ]
}

@jakelandis
Copy link
Contributor

Note - when testing this locally, I needed update the dependencies SHA's (./gradlew --write-verification-metadata sha256) before I could build it locally (https://gradle-enterprise.elastic.co/s/jk6akcmpvmvwk).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Security/Security Security issues without another label Team:Security Meta label for security team >upgrade v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants