-
Notifications
You must be signed in to change notification settings - Fork 249
Use isolated request contexts for task execution #1817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dimas-b - I'm also attempting to solve a related problem (#1765), but based on my understanding of Quarkus, I'm not understanding how this solves the problem we are seeing.
From what I got, you are adding the RealmContext object to a new PolarisRequestContext RequestScoped object. Does this inner RealmContext object not still expire at the end of the original request? Or is this working because you've added @ActivateRequestContext to the handleTask
method in QuarkusTaskExecutorImpl
- and refreshing the polarisRequestContext
there? If so, is there any side-effects of doing so?
Additionally, does this fix all existing Tasks or do we need to make additional changes to ensure that all existing tasks start reading from polarisRequestContext
?
Also, similar comment as @adutra on #1765 - is there any test that shows what's failing today and that this change fixes that?
|
Things are not broken today @gh-yzou fixed hard errors. The problem this PR addresses is the confusion between explicit method parameters and implicit context parameters. |
* #setRealmContext(RealmContext)}. | ||
*/ | ||
public RealmContext realmContext() { | ||
return realmCtx.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a runtime check here to prevent returning null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quarkus already makes a runtime exception if that is the case, but I'll add a check for a more friendly message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkState added
callContext = polarisContext; | ||
CallContext.setCurrentContext(callContext); | ||
|
||
metaStoreManager = managerFactory.getOrCreateMetaStoreManager(realmContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the real change is moving CallContext.setCurrentContext(callContext)
above, but GH formats it strangely 😅
CallContext.setCurrentContext
must happen earlier to make sure "thread local" stuff is aligned with CDI context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, after rebasing these changes are not meaningful, indeed 🤷
@@ -51,7 +54,10 @@ public DefaultConfigurationStore( | |||
|
|||
@Override | |||
public <T> @Nullable T getConfiguration(@Nonnull RealmContext realmContext, String configName) { | |||
String realm = realmContext.getRealmIdentifier(); | |||
String realm = realmContextInstance.get().getRealmIdentifier(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the rationale here? I thought we would be removing realmContextInstance
completely. It's imho more elegant to receive the realm context as a method parameter than injecting an Instance<RealmContext>
– especially since this bean is application-scoped, so receiving a realm context as a method parameter aligns more with the common idiom that we have everywhere in Polaris where an application-scoped bean exposes a method that takes a realm context and returns something for that realm (e.g. all the getOrCreateXYZ
methods).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that will not be needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking goes in the other direction. I'd like to make DeafaultConfigurationStore
a proper CDI bean getting realm ID from the request context. Ideally Instance<RealmContext>
simply becomes RealmContext
.
Eventually, I'd like to remove the RealmContext
parameter.
I believe refactoring the code and internal interfaces to be free from RealmContext
parameters in methods would be beneficial. Most of the code is written to operate within a realm. Managing various objects (e.g. meta store instances) for multiple realms feels like an infrastructure concern. WDYT?
@@ -175,15 +175,14 @@ public void before(TestInfo testInfo) { | |||
diagServices, | |||
configurationStore, | |||
Clock.systemDefaultZone()); | |||
CallContext.setCurrentContext(polarisContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same: is this change required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just moving CallContext.setCurrentContext
above the calls that rely on it -- to ensure CDI context has the same data and thread locals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, after rebasing these changes are not meaningful, indeed 🤷
...s/service/src/main/java/org/apache/polaris/service/quarkus/config/PolarisRequestContext.java
Outdated
Show resolved
Hide resolved
@@ -49,9 +52,11 @@ public QuarkusTaskExecutorImpl( | |||
MetaStoreManagerFactory metaStoreManagerFactory, | |||
TaskFileIOSupplier fileIOSupplier, | |||
Tracer tracer, | |||
PolarisEventListener polarisEventListener) { | |||
PolarisEventListener polarisEventListener, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand how the propagation works, but will we be able to add some test to make sure the context is propagated correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered by existing tests that produce tasks.
It works like this:
- Quarkus create a new request context when
handleTask
is called - The code below this like puts the
RealmContext
for the tasks intopolarisRequestContext
- The Quarkus proxy for
polarisRequestContext
redirects this "set" call to the instance in the current request context - When something needs a
RealmContext
during the execution of a task, Quarkus invokesQuarkusProducers.realmContext(PolarisRequestContext context)
and gives it the samePolarisRequestContext
instance that was used in step 3.
For HTTP requests step 1 is the start of the request at the REST API layer, in which case the RealmContext
is derived from HTTP headers by a filter.
This whole workflow is very similar to how it worked before, but we use a custom "holder" object instead of ContainerRequestContext
, which allows us to support non-HTTP requests (i.e. async tasks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, after rebasing test coverage is gone 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered by existing tests that produce tasks.
Unfortunately I don't think we have such case running in our CI today, otherwise, it would have caught the problem when we do ConfigurationStore injection. The only case i know which triggers background task is purge, which was caught by an aws regression test, which can only run manually today.
I am trying to add a spark integration test here https://github.com/apache/polaris/pull/1825/files, but runs into following error when do drop with purge
2025-06-05 18:36:21,503 INFO [org.apa.pol.ser.exc.IcebergExceptionMapper] [,POLARIS] [,,,] (executor-thread-1) Handling runtimeException Principal 'root' with activated PrincipalRoles '[]' and activated grants via '[service_admin, catalog_admin]' is not authorized for op DROP_TABLE_WITH_PURGE
i am wondering if you know how the catalogAPI privilege is setup with quarkus test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's using default privileges in this case. Related slack discussion: https://apache-polaris.slack.com/archives/C084XDM50CB/p1748381388095509
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: test coverage : when my new code had checkState
assertions, it caused CI failures in the tests I had to modify. Unfortunately, after rebasing request context stuff is no longer injected into the default config impl., so it cannot perform those checks 🤷 I'll address this if the community decides to go ahead with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's using default privileges in this case. Related slack discussion: https://apache-polaris.slack.com/archives/C084XDM50CB/p1748381388095509
Thanks! let me see if I can get this test work.
@@ -73,6 +79,7 @@ protected void handleTask(long taskEntityId, CallContext callContext, int attemp | |||
.setAttribute("polaris.task.attempt", attempt) | |||
.startSpan(); | |||
try (Scope ignored = span.makeCurrent()) { | |||
polarisRequestContext.setRealmContext(callContext.getRealmContext()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the propagation now, will we be able to remove the code here https://github.com/dimas-b/polaris/blob/main/polaris-service/src/main/java/org/apache/polaris/service/task/TaskExecutorImpl.java#L71
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good point. I'll do it later if people agree on the general approach in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, i think we probably need to propagate the whole callcontext, instead of just the realmContext, because the whole callContext is needed for the background executor, the realmContext is just what needed for getConfiguration, which is called during the task execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually - yes. With the recent config changes this PR is no longer meaningful in isolation, but it shows how CDI can be done for tasks.
I also opened a bigger related proposal on the dev ML: https://lists.apache.org/thread/0cyrzft2oon28otxlmmhvd5671rd3r3d
* Avoid propagating request contexts into threads running background tasks. * Use fresh request context for each task instead. * Manually propagate realm context into a task from the request that submits the task. * Fix tests to set thread-local data to match CDI context data. Next step: Simplify DefaultConfigurationStore API to remove contextual arguments and use CDI context instead.
So, after rebasing the new way of handling request context in tasks is indeed not very meaningful, because tasks no longer use request-scoped beans. However, I'd like this PR to act as an illustration for the proposal of moving all |
Avoid propagating request contexts into threads running background tasks.
Use fresh request context for each task instead.
Manually propagate realm context into a task from the request that submits the task.
Update DefaultConfigurationStore to cross-check realm IDs in request context and explicit method parameters.
Fix tests to set thread-local data to match CDI context data.
Next step: Simplify DefaultConfigurationStore API to remove contextual arguments and use CDI context instead.