Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: client reconnected every authenticationRefreshCheckSeconds when using tls authentication #1062

Merged
merged 3 commits into from
Jul 27, 2023

Conversation

jffp113
Copy link
Contributor

@jffp113 jffp113 commented Jul 20, 2023

Motivation

When using pulsar tls authentication with a broker that sets the authenticationRefreshCheckSeconds the connection was dropped for each authentication refresh check. After analyzing logs and tcpdumps I concluded that this error appears because the tls authentication is returning null, witch does not pass a validation in the broker.
After analyzing the tls auth implementation in Java (that works), I concluded that the GetData method should return empty byte array instead of nil.

2023-07-20 15:32:32 2023-07-20T14:32:32,225+0000 [pulsar-io-29-5] INFO  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:42304] Refreshing authentication credentials for originalPrincipal null and authRole pulsar-admin
2023-07-20 15:32:32 2023-07-20T14:32:32,225+0000 [pulsar-io-29-9] INFO  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:42306] Refreshing authentication credentials for originalPrincipal null and authRole pulsar-admin
2023-07-20 15:32:32 2023-07-20T14:32:32,225+0000 [pulsar-io-29-6] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.18.0.1:33616] Refreshing authentication credentials for originalPrincipal null and authRole pulsar-admin
2023-07-20 15:32:32 2023-07-20T14:32:32,228+0000 [pulsar-io-29-6] WARN  org.apache.pulsar.broker.service.ServerCnx - [/172.18.0.1:33616] Got exception java.lang.IllegalArgumentException
2023-07-20 15:32:32     at com.google.common.base.Preconditions.checkArgument(Preconditions.java:131)
2023-07-20 15:32:32     at org.apache.pulsar.broker.service.ServerCnx.handleAuthResponse(ServerCnx.java:897)
2023-07-20 15:32:32     at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:362)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
2023-07-20 15:32:32     at io.netty.handler.flow.FlowControlHandler.dequeue(FlowControlHandler.java:200)
2023-07-20 15:32:32     at io.netty.handler.flow.FlowControlHandler.channelRead(FlowControlHandler.java:162)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
2023-07-20 15:32:32     at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
2023-07-20 15:32:32     at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
2023-07-20 15:32:32     at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373)
2023-07-20 15:32:32     at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1247)
2023-07-20 15:32:32     at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1287)
2023-07-20 15:32:32     at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
2023-07-20 15:32:32     at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
2023-07-20 15:32:32     at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
2023-07-20 15:32:32     at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
2023-07-20 15:32:32     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
2023-07-20 15:32:32     at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
2023-07-20 15:32:32     at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
2023-07-20 15:32:32     at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
2023-07-20 15:32:32     at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
2023-07-20 15:32:32     at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
2023-07-20 15:32:32     at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
2023-07-20 15:32:32     at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2023-07-20 15:32:32     at java.base/java.lang.Thread.run(Thread.java:829)
2023-07-20 15:32:32 

Modifications

Changed tls auth GetData to return empty byte array instead of nil.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: no
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: no

Documentation

  • Does this pull request introduce a new feature? no

@jffp113 jffp113 changed the title Fix: client reconnected every authenticationRefreshCheckSeconds Fix: client reconnected every authenticationRefreshCheckSeconds when using tls authentication Jul 20, 2023
@RobertIndie
Copy link
Member

Hi, @jffp113. Thanks for your PR. :)
Could you add a test? Is there any reproducible code?

@jffp113
Copy link
Contributor Author

jffp113 commented Jul 21, 2023

Hi @RobertIndie,
I have tested this fix locally with a producer and pulsar standalone.
How can I write a integration test to check for connection drops after a auth challenge?

@jffp113
Copy link
Contributor Author

jffp113 commented Jul 24, 2023

@RobertIndie
Copy link
Member

Hi @jffp113 , Could you share your reproduce steps?

@jffp113
Copy link
Contributor Author

jffp113 commented Jul 25, 2023

Hi @RobertIndie,

Yes.

You need to add authenticationRefreshCheckSeconds config into the broker configuration. You can set for example 30 seconds so that the client can crash fast. authenticationRefreshCheckSeconds forces authentication every x seconds. This is a good practice, for example, a certificate expiry date is only rechecked if there is a refresh authentication.

Then you configure simple golang pulsar producer that uses tls authentication.

You only need that two steps. I have tested this on pulsar 2.10.3 version.

If you need a more detail explanation I'm happy to provide.

@jffp113
Copy link
Contributor Author

jffp113 commented Jul 25, 2023

Hi again,

I found the line of code in the java client that converts a null/empty auth data to a empty string:
https://github.com/apache/pulsar/blob/69d7a2bf14555f11a716a9545c5cf391d8179a27/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/AuthenticationDataProvider.java#L161

@RobertIndie
Copy link
Member

Unfortunately, I couldn't reproduce it. I used AuthenticationProviderTls and set authenticationRefreshCheckSeconds=5. But it didn't refresh the authentication check.

After checking the code, just found that AuthenticationProviderTls doesn't override newAuthState method so it is returning the OneStageAuthenticationState:https://github.com/apache/pulsar/blob/master/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/authentication/AuthenticationProvider.java#L100-L105.
The OneStageAuthenticationState will never be expired. Therefore it seems that is should never pass this condition: https://github.com/apache/pulsar/blob/b69f4efa6058c3f51885a61a2b3acb46f8b730f4/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L720-L722
So it won't trigger the refresh check. I'm not sure what I didn't set up correctly.

Anyway, I'm +1 for your PR. But I'm trying to reproduce the issue to investigate it further.

@jffp113
Copy link
Contributor Author

jffp113 commented Jul 25, 2023

I noticed that we have a custom authentication plugin that does not always return false in AuthenticationState::isExpired:
https://github.com/apache/pulsar/blob/69d7a2bf14555f11a716a9545c5cf391d8179a27/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/authentication/AuthenticationState.java#L107C28-L107C28

Sorry for the confusion.

@RobertIndie
Copy link
Member

@jffp113 Thanks for your explanation. But it doesn't seem to make sense to refresh the tls auth. And it always returns the empty auth data.

Copy link
Member

@RobertIndie RobertIndie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve your PR because this issue may happen when using the custom authentication provider. And it also makes the behavior consistent with the Java client.

@jffp113
Copy link
Contributor Author

jffp113 commented Jul 26, 2023

@RobertIndie If want to check if the certificate is still valid we must refresh tls (e.g check the CRL, check expires at field, etc)

@jffp113
Copy link
Contributor Author

jffp113 commented Jul 26, 2023

We could make this change generic, and if any provider returns nil we change to []byte{}.
I did this in the cpp client:
apache/pulsar-client-cpp#304

@RobertIndie
Copy link
Member

If want to check if the certificate is still valid we must refresh tls (e.g check the CRL, check expires at field, etc)

This would only happen when we reconnect the client connection. But refreshing the authentication won't check the tls certificate.

We could make this change generic, and if any provider returns nil we change to []byte{}.

I'm +1 for this.

@RobertIndie RobertIndie merged commit 16a0299 into apache:master Jul 27, 2023
graysonzeng pushed a commit to graysonzeng/pulsar-client-go that referenced this pull request Jul 31, 2023
…using tls authentication (apache#1062)

### Motivation

When using pulsar tls authentication with a broker that sets the authenticationRefreshCheckSeconds the connection was dropped for each authentication refresh check. After analyzing logs and tcpdumps I concluded that this error appears because the tls authentication is returning null, witch does not pass a validation in the broker.
After analyzing the tls auth implementation in Java (that works), I concluded that the GetData method should return empty byte array instead of nil.

### Modifications

Changed tls auth GetData to return empty byte array instead of nil.

---------

Co-authored-by: Jorge Pereira <jorge.pereira@cross-join.com>
RobertIndie pushed a commit that referenced this pull request Sep 7, 2023
…using tls authentication (#1062)

### Motivation

When using pulsar tls authentication with a broker that sets the authenticationRefreshCheckSeconds the connection was dropped for each authentication refresh check. After analyzing logs and tcpdumps I concluded that this error appears because the tls authentication is returning null, witch does not pass a validation in the broker.
After analyzing the tls auth implementation in Java (that works), I concluded that the GetData method should return empty byte array instead of nil.

### Modifications

Changed tls auth GetData to return empty byte array instead of nil.

---------

Co-authored-by: Jorge Pereira <jorge.pereira@cross-join.com>
(cherry picked from commit 16a0299)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants