-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename methods in o.e.x.c.security.support.Automatons #114594
base: lucene_snapshot
Are you sure you want to change the base?
Rename methods in o.e.x.c.security.support.Automatons #114594
Conversation
Lucene 10 stopped relying in on automaton minimization and moved the underlying Hopcroft algorithm to test code (for reasoning see apache/lucene#528). With the upgrade to Lucene 10 we currently also only determinize automata. The security Automatons utility class currently contains several methods that sound like they would minimize the automaton, but this has changed so this PR also changes the method names accordingly.
Pinging @elastic/es-security (Team:Security) |
While this change in itself does only rename methods, I want to use it to get a second look at the underlying change, that is we currently can non longer use the automaton minimization algorithm provided in Lucene.
With that in mind it's possible that with our own use of automata here we don't need to worry about the removal of minimization, but I'd like the folks at Team:Security to be aware of this and maybe take another look at how complex / big average automata construction gets here. If it turns out we rely heavily on minimization, we will need to re-introduce it in ES by e.g. lifting the Hopcroft algorithm from Lucenes test code and maintain it ourselves. Just fyi: @javanna @ChrisHegarty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Unfortunately we don't have the benchmarks in place to determine what kind of impact this could make. The security code base makes heavy use of these structures and they generally fall into one of two categories.
The first category are automatons that get built at start up time and used at runtime to check permissions based on RBAC is the primary workflow. We have some macro level dashboards to help us asses, if at scale this introduces additional latency to the authentication workflows. We can monitor those dashboards to help ensure we didn't add significant latency.
The second workflow are ones that are computed at runtime and used at runtime for application privileges. We already know of cases where this is a performance issue, generally on the building of these structures. I think this change might actually help in those cases since, AFAIK this will result in less work during the building stage, which might improve the overall performance. We have a task to address this specific issue in the near future and that work might shed some additional light.
So, I think there are a lot of trade off's here that we are not prepared to fully understand how they could impact Elasticsearch w.r.t. performance. However, given Lucene's comments, with some macro level dsahboards and a fall back plan (owning the minimize) as well as some work upcoming work to better optimize, I feel comfortable making this change now.
Thanks @jakelandis. Just to make sure one clarifying note:
As you mentioned later in your comment, the nature of this change (omitting minimization) will most likely not add additional latency but rather remove complexity when automata are built. The part we're not 100% sure is if the current minimization reduces memory footprint of the resulting data structure a lot. From the reasoning in the Lucene issue I would say no, but as you said, without a clear benchmarking case at hand this is hard to tell. Only stating this here because I think memory is what we should keep a closer look at once Lucene 10 is merged to main. |
Note - when testing this locally, I needed update the dependencies SHA's ( |
Lucene 10 stopped relying in on automaton minimization and moved the underlying Hopcroft algorithm to test code (for reasoning see apache/lucene#528). With the upgrade to Lucene 10 we currently also only determinize automata. The security Automatons utility class currently contains several methods that sound like they would minimize the automaton, but this has changed so this PR also changes the method names accordingly.