Skip to content

Use isolated request contexts for task execution #1817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dimas-b
Copy link
Contributor

@dimas-b dimas-b commented Jun 5, 2025

  • Avoid propagating request contexts into threads running background tasks.

  • Use fresh request context for each task instead.

  • Manually propagate realm context into a task from the request that submits the task.

  • Update DefaultConfigurationStore to cross-check realm IDs in request context and explicit method parameters.

  • Fix tests to set thread-local data to match CDI context data.

Next step: Simplify DefaultConfigurationStore API to remove contextual arguments and use CDI context instead.

Copy link
Collaborator

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dimas-b - I'm also attempting to solve a related problem (#1765), but based on my understanding of Quarkus, I'm not understanding how this solves the problem we are seeing.

From what I got, you are adding the RealmContext object to a new PolarisRequestContext RequestScoped object. Does this inner RealmContext object not still expire at the end of the original request? Or is this working because you've added @ActivateRequestContext to the handleTask method in QuarkusTaskExecutorImpl - and refreshing the polarisRequestContext there? If so, is there any side-effects of doing so?

Additionally, does this fix all existing Tasks or do we need to make additional changes to ensure that all existing tasks start reading from polarisRequestContext?

Also, similar comment as @adutra on #1765 - is there any test that shows what's failing today and that this change fixes that?

@dimas-b
Copy link
Contributor Author

dimas-b commented Jun 5, 2025

@ActivateRequestContext in combination with .propagated(ThreadContext.NONE) (see it in diff) result in the creation of a new, independent request context for each (async) task.

PolarisRequestContext and all other @RequestScoped objects still expire at the end the request, but this "request" for task is the lifecycle of the QuarkusTaskExecutorImpl.handleTask() method call now.

PolarisRequestContext does not need to be accessed directly by tasks. In most cases relevant objects are injected by CDI.

@dimas-b
Copy link
Contributor Author

dimas-b commented Jun 5, 2025

is there any test that shows what's failing today and that this change fixes that?

Things are not broken today @gh-yzou fixed hard errors. The problem this PR addresses is the confusion between explicit method parameters and implicit context parameters.

* #setRealmContext(RealmContext)}.
*/
public RealmContext realmContext() {
return realmCtx.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a runtime check here to prevent returning null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quarkus already makes a runtime exception if that is the case, but I'll add a check for a more friendly message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkState added

callContext = polarisContext;
CallContext.setCurrentContext(callContext);

metaStoreManager = managerFactory.getOrCreateMetaStoreManager(realmContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the real change is moving CallContext.setCurrentContext(callContext) above, but GH formats it strangely 😅

CallContext.setCurrentContext must happen earlier to make sure "thread local" stuff is aligned with CDI context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, after rebasing these changes are not meaningful, indeed 🤷

@@ -51,7 +54,10 @@ public DefaultConfigurationStore(

@Override
public <T> @Nullable T getConfiguration(@Nonnull RealmContext realmContext, String configName) {
String realm = realmContext.getRealmIdentifier();
String realm = realmContextInstance.get().getRealmIdentifier();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the rationale here? I thought we would be removing realmContextInstance completely. It's imho more elegant to receive the realm context as a method parameter than injecting an Instance<RealmContext> – especially since this bean is application-scoped, so receiving a realm context as a method parameter aligns more with the common idiom that we have everywhere in Polaris where an application-scoped bean exposes a method that takes a realm context and returns something for that realm (e.g. all the getOrCreateXYZ methods).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that will not be needed anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking goes in the other direction. I'd like to make DeafaultConfigurationStore a proper CDI bean getting realm ID from the request context. Ideally Instance<RealmContext> simply becomes RealmContext.

Eventually, I'd like to remove the RealmContext parameter.

I believe refactoring the code and internal interfaces to be free from RealmContext parameters in methods would be beneficial. Most of the code is written to operate within a realm. Managing various objects (e.g. meta store instances) for multiple realms feels like an infrastructure concern. WDYT?

@@ -175,15 +175,14 @@ public void before(TestInfo testInfo) {
diagServices,
configurationStore,
Clock.systemDefaultZone());
CallContext.setCurrentContext(polarisContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same: is this change required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moving CallContext.setCurrentContext above the calls that rely on it -- to ensure CDI context has the same data and thread locals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, after rebasing these changes are not meaningful, indeed 🤷

@@ -49,9 +52,11 @@ public QuarkusTaskExecutorImpl(
MetaStoreManagerFactory metaStoreManagerFactory,
TaskFileIOSupplier fileIOSupplier,
Tracer tracer,
PolarisEventListener polarisEventListener) {
PolarisEventListener polarisEventListener,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand how the propagation works, but will we be able to add some test to make sure the context is propagated correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by existing tests that produce tasks.

It works like this:

  1. Quarkus create a new request context when handleTask is called
  2. The code below this like puts the RealmContext for the tasks into polarisRequestContext
  3. The Quarkus proxy for polarisRequestContext redirects this "set" call to the instance in the current request context
  4. When something needs a RealmContext during the execution of a task, Quarkus invokes QuarkusProducers.realmContext(PolarisRequestContext context) and gives it the same PolarisRequestContext instance that was used in step 3.

For HTTP requests step 1 is the start of the request at the REST API layer, in which case the RealmContext is derived from HTTP headers by a filter.

This whole workflow is very similar to how it worked before, but we use a custom "holder" object instead of ContainerRequestContext, which allows us to support non-HTTP requests (i.e. async tasks).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, after rebasing test coverage is gone 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by existing tests that produce tasks.

Unfortunately I don't think we have such case running in our CI today, otherwise, it would have caught the problem when we do ConfigurationStore injection. The only case i know which triggers background task is purge, which was caught by an aws regression test, which can only run manually today.

I am trying to add a spark integration test here https://github.com/apache/polaris/pull/1825/files, but runs into following error when do drop with purge

2025-06-05 18:36:21,503 INFO  [org.apa.pol.ser.exc.IcebergExceptionMapper] [,POLARIS] [,,,] (executor-thread-1) Handling runtimeException Principal 'root' with activated PrincipalRoles '[]' and activated grants via '[service_admin, catalog_admin]' is not authorized for op DROP_TABLE_WITH_PURGE

i am wondering if you know how the catalogAPI privilege is setup with quarkus test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's using default privileges in this case. Related slack discussion: https://apache-polaris.slack.com/archives/C084XDM50CB/p1748381388095509

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: test coverage : when my new code had checkState assertions, it caused CI failures in the tests I had to modify. Unfortunately, after rebasing request context stuff is no longer injected into the default config impl., so it cannot perform those checks 🤷 I'll address this if the community decides to go ahead with this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's using default privileges in this case. Related slack discussion: https://apache-polaris.slack.com/archives/C084XDM50CB/p1748381388095509

Thanks! let me see if I can get this test work.

@@ -73,6 +79,7 @@ protected void handleTask(long taskEntityId, CallContext callContext, int attemp
.setAttribute("polaris.task.attempt", attempt)
.startSpan();
try (Scope ignored = span.makeCurrent()) {
polarisRequestContext.setRealmContext(callContext.getRealmContext());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@dimas-b dimas-b Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good point. I'll do it later if people agree on the general approach in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, i think we probably need to propagate the whole callcontext, instead of just the realmContext, because the whole callContext is needed for the background executor, the realmContext is just what needed for getConfiguration, which is called during the task execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually - yes. With the recent config changes this PR is no longer meaningful in isolation, but it shows how CDI can be done for tasks.

I also opened a bigger related proposal on the dev ML: https://lists.apache.org/thread/0cyrzft2oon28otxlmmhvd5671rd3r3d

@dimas-b dimas-b force-pushed the task-request-ctx branch from f5b4f14 to 35f0871 Compare June 5, 2025 22:42
* Avoid propagating request contexts into threads running
  background tasks.

* Use fresh request context for each task instead.

* Manually propagate realm context into a task from the
  request that submits the task.

* Fix tests to set thread-local data to match CDI context
  data.

Next step: Simplify DefaultConfigurationStore API to
remove contextual arguments and use CDI context instead.
@dimas-b dimas-b force-pushed the task-request-ctx branch from 35f0871 to cb51efb Compare June 5, 2025 22:42
@dimas-b
Copy link
Contributor Author

dimas-b commented Jun 5, 2025

So, after rebasing the new way of handling request context in tasks is indeed not very meaningful, because tasks no longer use request-scoped beans.

However, I'd like this PR to act as an illustration for the proposal of moving all RealmContext from explicit method parameters to object fields to be injected by CDI (or manually). This PR shows that it is possible. I'll open a dev ML thread on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants