Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working impl of HK2 dependency injection #493

Merged
merged 27 commits into from
Dec 9, 2024

Conversation

collado-mike
Copy link
Contributor

Description

As discussed in the mailing list, this change proposes using the HK2 framework to support CDI-compliant dependency injection using the existing Dropwizard application structure and yaml configuration. This change does not make a logic changes or signature changes except to remove some of the injection-oriented interfaces (e.g., ConfigurationStoreAware) and updates constructors to accept dependencies via @Inject annotations.

The PolarisApplicationConfig is configured to be managed by HK2, but populated by the Jackson during the typical Dropwizard configuration parsing. Because the PolarisApplicationConfig is managed by HK2, it can invoke a @PostConstruct method that registers configured beans with the serviceLocator using the config class's own getter methods as Factory implementations. Because the other beans are not managed by HK2 (they can't be, since they declare dependencies on other beans that may not yet be instantiated by Jackson), the factory manually calls inject on each bean before returning.

Jersey has its own serviceLocator instance under the hood, so all of the beans bound to the PolarisApplicationConfig instance are registered again with the Jersey serviceLocator so that they can be found by the jersey components at runtime (I tried using the HK2 bridge that supposedly allows connecting two serviceLocators, but it didn't seem to work and also the licensing on that package suggests possible GPL violation).

This also introduces a @RealmScope that declares certain dependencies to be created and managed per-realm, rather than per-application or per-request. Some of the instances, such as the PolarisMetaStoreManager and the EntityCache are intended to be reused across requests, but of course, cannot be reused across realms, as some entities (such as root principal and service_admin principal role) have the same name across realms.

One additional test class is added to ensure that the @RealmScoped EntityCache is reused across requests, but is not reused across different realms.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • PolarisRealmEntityCacheTest

Test Configuration:

  • Hardware:
  • Toolchain:
  • SDK:

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of quick comments... I'll do a full review a bit later :)

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice improvement towards CDI!

hk2 adds a DI implementation, but in a couple of places this conflicts with the way Jackson is used as a DI-ish mechanism to instantiate things, leading to multiple DI-(ish) implementations.

I'm not sure how an approach to add "generic" extensions/plugins would look with hk2, as we cannot expect to have initialization code for all extensions/plugins in the "main" code base.

new AbstractBinder() {
@Override
protected void configure() {
bind(RealmScopeContext.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's a lot of manual coding needed for hk2, which feels like it makes things complicated, when we eventually have a plugin/extension mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be considered the smallest change necessary to get CDI working. There are improvements that can be made, e.g., classpath scanning for detecting Resource and Service impls, but the primary goal right now is to have runtime framework dependent on @Inject for wiring. Once that's done, lots of other changes can happen in parallel - e.g., improvements to the authorizer, authentication changes, as well as improvements to the CDI framework. I think this accomplishes the smallest change necessary to allow those other changes to progress.

@@ -54,4 +54,4 @@ USER root
RUN chmod -R go+rwx /home/spark/regtests
USER spark

CMD ["./run.sh"]
ENTRYPOINT ["./run.sh"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added when I was debugging this test failure- e5367d9 . TBH, I'm not sure how tests in main were passing

public void dispose(T instance) {}
}

public <T> T findService(Class<T> serviceClass) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a CDI concern rather than a configuration concern. I mean that it is perfectly fine to drive CDI with config settings, but findService does not appear to fit the DI paradigm 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, but the servlet filters don't use DI, so the PolarisApplication has to find them somehow

Comment on lines 24 to 27
[org.apache.polaris.service.auth.TestInlineBearerTokenPolarisAuthenticator]S
contract={io.dropwizard.auth.Authenticator}
name=test
qualifier={jakarta.inject.Named}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the purpose of this declaration. It seems rather redundant to me. The class of TestInlineBearerTokenPolarisAuthenticator already has the @Named("test") annotation and implements Authenticator. This file does not appear to add any new information to what is already known from class files 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that would normally be generated at build time by scanning the build artifacts. But that requires using the HK2-specific annotations (@Contract and @Service), which I think we agreed to avoid

@snazy
Copy link
Member

snazy commented Dec 3, 2024

Hm, looking into the hk2 github repo it looks like its rather unmaintained (minus version bumps and rather tiny changes).

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this PR looks good to me as it enables CDI and paves the way to further improvements / simplifications 👍

@dimas-b
Copy link
Contributor

dimas-b commented Dec 3, 2024

I'd also be willing to work on extracting services (generated API code + impl. classes) into a dedicated gradle module (to be pulled into DW runtime), unless you want to work on this from your side @collado-mike :)


@SuppressWarnings("unchecked")
@Override
public <U> U findOrCreate(ActiveDescriptor<U> activeDescriptor, ServiceHandle<?> root) {
Copy link
Contributor

@dimas-b dimas-b Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to implement @RealmScope as a @RequestScoped with custom bean producers that handle the realm lookup and caching? I think that might be more intuitive from the CDI perspective... and probably more portable. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd worry that the intention was not clear - readers would just have to know that an item annotated with @RequestScoped could actually live beyond the scope of a single request. In fact, they could be used by multiple concurrent requests, while the @RequestScoped annotation would convey that it's safe for an object to mutate state possibly leading to thread safety errors. I think all of the CDI providers offer support for custom scopes and it doesn't seem like a lot of work for what amounts to (IMO) clear intentions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point about concurrent access

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are mixing concepts here, a request scoped bean should not outlive the request. In the Quarkus PR the RealmContext is a request-scoped bean, and one instance of it is created for each request. It two requests target the same realm, they will get two distinct instances of RealmContext, but each would have the same realm ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're specifically talking about beans that should be reused across requests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have a global store for realm-specific stuff and expose it to callers via thin request-scoped wrappers? Disposing of the wrapper does not have to dispose of the long-lived data.

... besides, we never destroy realm-scoped beans anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that's exactly what the @RealmScoped annotation is doing. The RealmScopeContext maintains the cache per-realm and manages the instantiated values for each ActiveDescriptor for the currently active realm. This is a centralized place where we can do a lot of cleanup of stale realms (e.g., somebody made a call two days ago, but hasn't invoked an API since), rather than having to sprinkle this cache eviction in a bunch of different bean-specific factories.

I think the contention is whether we should be relying on a ThreadLocal to access the current RealmContext. I can see where there are flaws with this approach (e.g., we already ran into one during the asynchronous cleanup of metadata files after table drop), but at the same time, I think most (every?) implementation of @RequestScoped relies on ThreadLocal variables (e.g., AFAICT, ArC relies on ThreadLocalCurrentContext in its RequestContext ). Injecting an accessor to retrieve the current realm makes a lot of sense to me, but at the end of the day, I think we're still going to be relying on ThreadLocals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request scopes are probably implemented on top of some thread locals, but the important point here is that whatever mechanism is used to store and retrieve contextual data, it has to be managed by the runtime.

@collado-mike
Copy link
Contributor Author

I'd also be willing to work on extracting services (generated API code + impl. classes) into a dedicated gradle module (to be pulled into DW runtime), unless you want to work on this from your side @collado-mike :)

That sounds good to me! I have other things to work on :)

@@ -39,8 +40,8 @@
* Default implementation of the {@link OAuth2ApiService} that generates a JWT token for the client
* if the client secret matches.
*/
@JsonTypeName("default")
public class DefaultOAuth2ApiService implements OAuth2ApiService, HasMetaStoreManagerFactory {
@Named("default")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that @Named has some shortcomings. From CDI spec 3.0:

The use of @Named as an injection point qualifier is not recommended, except in the case of integration with legacy code that uses string-based names to identify beans.

One of them is that the name is application-wide, so you cannot have two beans named @Named("default").

Instead, we could use some other qualifier. People generally use @Identifier from Smallrye-Common library. The Quarkus PR is using that for now.

This article explains the pros and cons of each annotation:

https://dev.to/yanev/unveiling-challenges-with-named-67p

private static Logger LOGGER = LoggerFactory.getLogger(EntityCacheFactory.class);
@Inject PolarisMetaStoreManager metaStoreManager;

@RealmScope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I think it's much easier to turn this bean into application-scoped and have it hold a map of entity caches keyed by realm id, like RealmEntityManagerFactory.

@@ -48,17 +46,11 @@ protected void run(
Bootstrap<PolarisApplicationConfig> bootstrap,
Namespace namespace,
PolarisApplicationConfig configuration) {
MetaStoreManagerFactory metaStoreManagerFactory = configuration.getMetaStoreManagerFactory();
MetaStoreManagerFactory metaStoreManagerFactory =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commands (boostrap/purge) are going to be fun to adapt to CDI. I suggest we look into them later but my feeling is that we would need to move them to separate modules.

@collado-mike
Copy link
Contributor Author

collado-mike commented Dec 4, 2024

I agree that @Named has some shortcomings.
One of them is that the name is application-wide, so you cannot have two beans named @Named("default").

@adutra I haven't seen this behavior during testing. E.g., the DefaultOAuth2ApiService and the DefaultContextResolver are both named "default", but the application doesn't seem to have any problem starting up both instances. Could it be that we never use @Named as a qualifier at the injection point? From the article you shared, it seems like the issue is one of ambiguity at injection time if the qualifiers aren't specific. But the goal of the @Named annotation in this PR is to allow named instances to be specified in the configuration. We never use the @Named qualifier at the injection points (and I don't think we should, as it defeats the purpose of the user configuration).

@adutra
Copy link
Contributor

adutra commented Dec 5, 2024

@adutra I haven't seen this behavior during testing. E.g., the DefaultOAuth2ApiService and the DefaultContextResolver are both named "default", but the application doesn't seem to have any problem starting up both instances.

I suspect HK2 might not be fully compliant with the spec. From CDI spec 4.0:

An ambiguous name exists when a name resolves to multiple beans. When an ambiguous name exists, the container attempts to resolve the ambiguity.

So in our case, if both DefaultOAuth2ApiService and DefaultContextResolver are named default, and are both eligible for injection, then that should result in an unresolvable ambiguous bean situation.

For example with Quarkus, if I declare the following beans:

@Singleton @Named("foo")
public class Foo1 {}

@Singleton @Named("foo")
public class Foo2 {}

I get a deployment error when the app starts (even if the beans aren't injected anywhere):

java.lang.RuntimeException: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
	[error]: Build step io.quarkus.arc.deployment.ArcProcessor#generateResources threw an exception: jakarta.enterprise.inject.spi.DeploymentException: Unresolvable ambiguous bean name detected: foo
Beans:
CLASS bean [types=[org.apache.polaris.service.catalog.Foo1, java.lang.Object], qualifiers=[@Default, @Any, @Named("foo")], target=org.apache.polaris.service.catalog.Foo1]
CLASS bean [types=[org.apache.polaris.service.catalog.Foo2, java.lang.Object], qualifiers=[@Default, @Any, @Named("foo")], target=org.apache.polaris.service.catalog.Foo2]

To get past this issue, I would suggest that we copy SmallRye's @Identifier into our code base – if it's possible for HK2 to process custom qualifiers.

@collado-mike
Copy link
Contributor Author

To get past this issue, I would suggest that we copy SmallRye's @Identifier into our code base – if it's possible for HK2 to process custom qualifiers.

Yeah, it's possible to use the custom qualifiers. I think it's fine to switch the annotations - but copying the @Identifier code will make the classes that use it incompatible with the Quarkus work, right? I think it would be fine to just add a dependency on the SmallRye library for the annotation, as that moves the codebase into a more compatible state. WDYT?

@adutra
Copy link
Contributor

adutra commented Dec 6, 2024

Yeah, it's possible to use the custom qualifiers. I think it's fine to switch the annotations - but copying the @Identifier code will make the classes that use it incompatible with the Quarkus work, right? I think it would be fine to just add a dependency on the SmallRye library for the annotation, as that moves the codebase into a more compatible state. WDYT?

Copying wouldn't break Quarkus afaict, it's just a matter of telling it which identifier we intend to use. That said, if you are OK adding the dependency, that's even better 👍

@collado-mike
Copy link
Contributor Author

@adutra , I think I've addressed all of your comments. The one outstanding concern is the support for the @RealmScoped annotation. I think we should discuss the feasibility of supporting that in Quarkus in the Quarkus PR. Its use and the context to support it are framework-dependent anyway, so I think we can evaluate the feasibility of it specifically in Quarkus separately from this PR.

In the meantime, however, I think the basic CDI support is blocking work on several other tasks (e.g., the authentication/authorization PRs Dmitri and I have open, fixing the bootstrapping, table metadata logging). With this PR, I think we can work on better/more complete CDI support (e.g., through Quarkus) without any significant changes to the work done here and it can also accommodate the other work that's ongoing.

Copy link
Contributor

@adutra adutra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving in the spirit of allowing us to move forward. There will be issues to solve for sure, but I think we can fix them along the way. 👍

@collado-mike collado-mike enabled auto-merge (squash) December 9, 2024 19:03
@collado-mike collado-mike disabled auto-merge December 9, 2024 19:04
@collado-mike collado-mike enabled auto-merge (squash) December 9, 2024 19:31
@collado-mike
Copy link
Contributor Author

@adutra , I'll need another approval because I had to merge from main to resolve conflicts 🙄

@collado-mike collado-mike merged commit 9809e0e into apache:main Dec 9, 2024
5 checks passed
@collado-mike collado-mike deleted the mcollado-hk2-di branch December 19, 2024 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants