Skip to content

S2S-3829 Enable Spire in Rest-Util #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 39 commits into from
Closed

S2S-3829 Enable Spire in Rest-Util #574

wants to merge 39 commits into from

Conversation

carlhejiayu
Copy link
Contributor

@carlhejiayu carlhejiayu commented May 20, 2025

Description
Enable Server using Spire to do S2S authentication.

Detailed Changes

  1. Introduce new settings ssl.spire.enabled, by default it's false and everything function exactly the same as before; it will only try to do s2s through spire if this is turned on.
  2. Modify application and server constructor to take in X509Source. This argument is optional for non-spire setting. If it's configured to use spire, this argument became Mandatory.
  3. HelloWorldSpireApplication is the sample application with spire setting. We can take this as an example for all library users to follow.

Testing
Most of the testing are done through unit test where we mock X509Source to provide certain bundle and test against the server behaviour with its certain configuration. All the tests are written in SpireTest.java

Server Onboarding Spire Doc
Please look at the following doc to see how existing rest-util server can make config changes to onboard spire
https://confluentinc.atlassian.net/wiki/spaces/SECENG/pages/4470801894/Rest-Util+Server+onboarding+to+Spire

DEMO Video
Here is a demo video on how we can turn on certain setting and make our application doing spire S2S in our local dev machine
https://drive.google.com/file/d/1BXudIXuEIy0ck1QckdoapAMHcox9jU2D/view?usp=sharing

TODO: we can setup the container testing strategy where can spin up spire server/agent in the CI do the testing. But this is also not super trival, wont' be included in the scope of this PR.

@carlhejiayu carlhejiayu changed the title Carl/s2 s 3829 S2S-3829 Enable Spire in Rest-Util May 20, 2025
SpiffeSslContextFactory.SslContextOptions options = SpiffeSslContextFactory.SslContextOptions
.builder()
.x509Source(x509Source)
.acceptAnySpiffeId()
Copy link
Contributor Author

@carlhejiayu carlhejiayu May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, I configured to acceptAnySpiffeId . The authorization design and implementation is not part of this PR scope.
We should have new design session on how we can reject/allow certain spiffee ID to hit certain endpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not configure a regex pattern here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the native library here, they don't natively support regex. If we want to have more complex pattern, we have to define one more layer abstraction to do this functionality, which is why it won't be this PR scope.

@@ -116,8 +124,18 @@ public ApplicationServer(T config, ThreadPool threadPool) {

listeners = config.getListeners();

if (x509Source == null) {
if (config.getBoolean(RestConfig.SSL_IS_SPIRE_ENABLED_CONFIG)) {
throw new RuntimeException("X509Source must be provided when SPIRE SSL is enabled");
Copy link
Contributor Author

@carlhejiayu carlhejiayu May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to allow callers to pass in X509Source. The main reason is that we want to allow multiple application in the same server to be able to share the same X509Source to reduce the connection to spire agents. This is very particular important for performance reasons.

Here, I choose to throw error when we see X509Source is NULL. There are some reasons why I make X509Source mandatory instead of to initializing one if it is NULL.

  1. initializing X509Source require lots of custom params, like socketPath , initTimeOut. We should not put these setting in the application config. Instead, responsibility of how to create it should be controlled by caller.
  2. It is very difficult to determine the lifecycle of this object from the server standpoint. If the server decided to
    initialize on behalf of caller, then it should have logic to close the connection when the server is down; However, the server need to differentiate the case when X509Source is passed v.s created.

Overall, we can see helping client to initializing X509Source is violation of SRP (single responsibility principle).

To conclude, we have let caller to pass in X509Source if they want to use s2s-spire because the caller have the absolute control on how it got created and how long it should last. [These responsibilities do not belong to server]

Comment on lines -110 to -126
setSecurityStoreProps(sslConfig, sslContextFactory, true, false);
sslContextFactory.setKeyManagerPassword(sslConfig.getKeyManagerPassword());

if (!sslConfig.getKeyManagerFactoryAlgorithm().isEmpty()) {
sslContextFactory.setKeyManagerFactoryAlgorithm(sslConfig.getKeyManagerFactoryAlgorithm());
}

if (sslConfig.getReloadOnKeyStoreChange()) {
Path watchLocation = Paths.get(sslConfig.getReloadOnKeyStoreChangePath());
try {
FileWatcher.onFileChange(watchLocation,
onFileChangeCallback(sslConfig, sslContextFactory));
log.info("Enabled SSL cert auto reload for: " + watchLocation);
} catch (java.io.IOException e) {
log.error("Cannot enable SSL cert auto reload", e);
}
}
Copy link
Contributor Author

@carlhejiayu carlhejiayu May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor this part because it failed the style check; if no refactor done, it will complain with this error
NPathComplexity

Just note: this refactor has not made any no functional changes ; it just just refactor into separate function to make the style check passing

@carlhejiayu carlhejiayu marked this pull request as ready for review May 28, 2025 20:16
@carlhejiayu carlhejiayu requested review from a team as code owners May 28, 2025 20:16
}

public ApplicationServer(T config, ThreadPool threadPool) {
public ApplicationServer(T config, ThreadPool threadPool, X509Source x509Source) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing this constructor we can add a new constructor which calls your new one with null for the x509Source, that way you don't introduce a breaking change

  public ApplicationServer(T config, ThreadPool threadPool) {
    this(config, threadPool, null);
  }

If a service needs S2S then they call

 public ApplicationServer(T config, ThreadPool threadPool, X509Source x509Source)

everything else calls or continues to use the same existing constructor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes added more constructor for callers to be more flexible

this(config, createThreadPool(config), x509Source);
}

public ApplicationServer(T config,ThreadPool threadPool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx. I made this commit to fix the white space

@akhileshm1
Copy link
Member

@carlhejiayu, is it possible to make this PR against 7.9.x? we can pint-merge the changes to master. the reason being SR is currently on 7.9.x in CC.

@carlhejiayu
Copy link
Contributor Author

@carlhejiayu, is it possible to make this PR against 7.9.x? we can pint-merge the changes to master. the reason being SR is currently on 7.9.x in CC.

Yes @akhileshm1 , i created a brand new PR that is based in 7.9.X. see here: #583

Copy link
Contributor

@dragonmushu dragonmushu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks added some comments and also is there a way we can log mTLS failures

@@ -206,6 +206,11 @@ public class RestConfig extends AbstractConfig {
+ " runtime request tags in global stats.";
protected static final boolean METRICS_GLOBAL_STATS_REQUEST_TAGS_ENABLE_DEFAULT = false;

public static final String SSL_IS_SPIRE_ENABLED_CONFIG = "ssl.spire.enabled";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would named listeners be supported here. Basically we need to also be able to open up multiple ssl ports within same application. One for spire and another maybe that is just tls.

Copy link
Contributor Author

@carlhejiayu carlhejiayu Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at rest-util ssl config, i think configuration applied to all the ports that doing SSL. This is by the design, and i don't think they have option to support custom SSL setting per port; to do that, that seemed much bigger refactor also i think it's might be slight anti-pattern.

@dragonmushu , I would suggest that you initialize 2 application instance . For example, the first instance could be configured like this SSL.spire.enabled=true & port = 8080, and the second application could be configured like this SSL.spire.enabled=false& port = 9090 . i believe this should achieve this same purpose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is by the design, and i don't think they have option to support custom SSL setting per port; to do that, that seemed much bigger refactor also i think it's might be slight anti-pattern.

This is not true. Currently we have named listeners that allows us to set to different tls settings for each listener in rest-utils. We shouldn't break that functionality.

Copy link
Contributor Author

@carlhejiayu carlhejiayu Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dragonmushu , can you put up the github link of what you have already done so that i can see? I may not be familiar with it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@carlhejiayu carlhejiayu Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dragonmushu , in the example link you provided, it looks you have 2 port , 1 is TLS, 1 is unsecured. I am not seeing that you set 2 ports for both TLS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats my first PR. In the second PR I also added another listener for mtls. https://github.com/confluentinc/cc-security-decisions/pull/1442/files

log.error("Cannot enable SSL cert auto reload", e);
}
}
configureKeyStore(sslContextFactory, sslConfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both shouldn't be configured right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have spire enabled, you are right that they won't be needed to be configured; However, even configuring won't do much anything because internally the server will go to talk spire agent to grab all the trust bundle and stuff into cache; it will never have a need to reach the KeyStore even Keystore was configured. I already provided the comment here

SpiffeSslContextFactory.SslContextOptions options = SpiffeSslContextFactory.SslContextOptions
.builder()
.x509Source(x509Source)
.acceptAnySpiffeId()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not configure a regex pattern here

Copy link
Contributor

@dragonmushu dragonmushu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlhejiayu , @moe-omar do we not want to interface with the SRJ standalone module here: https://confluentinc.atlassian.net/wiki/x/HAId6w instead of the raw java library

Instead of passing in an X509 source can we pass in the SpireX509Client.

@dragonmushu
Copy link
Contributor

Also @carlhejiayu one thing I called out in the requirements was observability around the

@carlhejiayu , @moe-omar do we not want to interface with the SRJ standalone module here: https://confluentinc.atlassian.net/wiki/x/HAId6w instead of the raw java library

Instead of passing in an X509 source can we pass in the SpireX509Client.

Sorry for these late reviews. It would be good to unify these two libraries because clients are recommended to use the SRJ library to configure client SDKs and so if we can somehow just wrap the SRJ module here for the server side it would be nice. Also we would only try to have one connection to the agent per service.

@carlhejiayu
Copy link
Contributor Author

@carlhejiayu , @moe-omar do we not want to interface with the SRJ standalone module here: https://confluentinc.atlassian.net/wiki/x/HAId6w instead of the raw java library

Instead of passing in an X509 source can we pass in the SpireX509Client.

Hi @dragonmushu , I would not recommend do this for the following the 2 reason:

  1. SpireX509Client is an SRJ concept (SRJ is confluent internal repo), we should not import confluent dependency into this public repo rest-util; in fact, it is not allowed.
  2. I believe SRJ is on process of using rest-util to do their implementation. I believe that X509Source should be the way to go unless you have a strong reason we should keep SpireX509Client

@carlhejiayu
Copy link
Contributor Author

Also @carlhejiayu one thing I called out in the requirements was observability around the

@carlhejiayu , @moe-omar do we not want to interface with the SRJ standalone module here: https://confluentinc.atlassian.net/wiki/x/HAId6w instead of the raw java library
Instead of passing in an X509 source can we pass in the SpireX509Client.

Sorry for these late reviews. It would be good to unify these two libraries because clients are recommended to use the SRJ library to configure client SDKs and so if we can somehow just wrap the SRJ module here for the server side it would be nice. Also we would only try to have one connection to the agent per service.

I believe SRJ would start to rely on rest-util

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants