Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handshake error when connecting to AWS NLB using TLS 1.2 and NIO #1280

Closed
sergio91pt opened this issue Apr 2, 2024 · 2 comments · Fixed by #1281
Closed

Handshake error when connecting to AWS NLB using TLS 1.2 and NIO #1280

sergio91pt opened this issue Apr 2, 2024 · 2 comments · Fixed by #1281
Labels
Milestone

Comments

@sergio91pt
Copy link
Contributor

sergio91pt commented Apr 2, 2024

Describe the bug

When using a recent version of rabbitmq-java-client, we cannot connect to a AWS Load Balancer using TLS 1.2 and NIO due to an "handshake error".

We were unable to replicate using a local RMQ instance with TLS 1.2, only when connecting to the load balancer.
It also does not occur when connecting using TLS 1.3 or when using TLS 1.2 without NIO.

Downgrading rabbitmq-java-client to 5.13.1 fixes the issue, so we believe it is caused by #716.

Reproduction steps

  1. Setup an AWS Network Load balancer in "front of" your RabbitMQ cluster (ELBSecurityPolicy-TLS13-1-2-2021-06).
  2. Connect to the LB using useNio() and useSslProtocol() (defaults to TLSv1.2 and trusts every certificate).
  3. Application receives the following exception:
javax.net.ssl|ERROR|10|main|2024-03-28 18:51:38.875 UTC|null:-1|Fatal (UNEXPECTED_MESSAGE): Unexpected handshake message: server_hello (
"throwable" : {
  javax.net.ssl.SSLProtocolException: Unexpected handshake message: server_hello
  	at java.base/sun.security.ssl.Alert.createSSLException(Unknown Source)
  	at java.base/sun.security.ssl.Alert.createSSLException(Unknown Source)
  	at java.base/sun.security.ssl.TransportContext.fatal(Unknown Source)
  	at java.base/sun.security.ssl.TransportContext.fatal(Unknown Source)
  	at java.base/sun.security.ssl.TransportContext.fatal(Unknown Source)
  	at java.base/sun.security.ssl.HandshakeContext.dispatch(Unknown Source)
  	at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(Unknown Source)
  	at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(Unknown Source)
  	at java.base/java.security.AccessController.doPrivileged(Unknown Source)
  	at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(Unknown Source)
  	at com.rabbitmq.client.impl.nio.SslEngineHelper.runDelegatedTasks(SslEngineHelper.java:85)
  	at com.rabbitmq.client.impl.nio.SslEngineHelper.unwrap(SslEngineHelper.java:120)
  	at com.rabbitmq.client.impl.nio.SslEngineHelper.doHandshake(SslEngineHelper.java:60)
  	at com.rabbitmq.client.impl.nio.SocketChannelFrameHandlerFactory.create(SocketChannelFrameHandlerFactory.java:112)
  	at com.rabbitmq.client.impl.recovery.RecoveryAwareAMQConnectionFactory.newConnection(RecoveryAwareAMQConnectionFactory.java:63)
  	at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.init(AutorecoveringConnection.java:160)
  	at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1227)
  	at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1184)
  	at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1322)

Expected behavior

Application is able to connect to the load balancer with TLSv1.2 and NIO.

Additional context

This occurs after client_hello and server_hello. TLS 1.2 is negotiated.
It occurs just after the certificate chain is received.

Details

Starting TLS handshake
Initial handshake status is NEED_WRAP
Handshake status is NEED_WRAP
Wrapping...
Handshake status is NEED_WRAP before wrapping
SSL engine result is Status = OK HandshakeStatus = NEED_UNWRAP
bytesConsumed = 0 bytesProduced = 365 sequenceNumber = 0 after wrapping
Wrote 365 byte(s)
Handshake status is NEED_UNWRAP
Unwrapping...
Handshake status is NEED_UNWRAP before unwrapping
Cipher in position 0
Reading from channel
Read 5084 byte(s) from channel
SSL engine result is Status = OK HandshakeStatus = NEED_TASK
bytesConsumed = 100 bytesProduced = 0 after unwrapping
Running delegated task
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.853 UTC|null:-1|Consuming ServerHello handshake message (
"ServerHello": {
  "server version"      : "TLSv1.2",
  "random"              : "BFA3287C7BA4DFED44C112B2580EBBE2BD26BAA852C4D49F444F574E47524401",
  "session id"          : "AF42501CF28AA3F4F6A4A7DD855314586F0CCBBD810F826A4FFEDA27B4CF7D6D",
  "cipher suite"        : "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256(0xC02F)",
  "compression methods" : "00",
  "extensions"          : [
    "server_name (0)": {
      <empty extension_data field>
    },
    "ec_point_formats (11)": {
      "formats": [uncompressed]
    },
    "renegotiation_info (65,281)": {
      "renegotiated connection": [<no renegotiated connection>]
    },
    "extended_master_secret (23)": {
      <empty>
    }
  ]
}
)
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.853 UTC|null:-1|Ignore unavailable extension: supported_versions
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.853 UTC|null:-1|Negotiated protocol version: TLSv1.2
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.854 UTC|null:-1|Consumed extension: renegotiation_info
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.854 UTC|null:-1|Consumed extension: server_name
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.854 UTC|null:-1|Ignore unavailable extension: max_fragment_length
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.854 UTC|null:-1|Ignore unavailable extension: status_request
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.854 UTC|null:-1|Consumed extension: ec_point_formats
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.854 UTC|null:-1|Ignore unavailable extension: status_request_v2
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.854 UTC|null:-1|Consumed extension: extended_master_secret
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.854 UTC|null:-1|Ignore unavailable extension: session_ticket
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.855 UTC|null:-1|Consumed extension: renegotiation_info
javax.net.ssl|WARNING|10|main|2024-03-28 18:51:38.855 UTC|null:-1|Ignore impact of unsupported extension: server_name
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.855 UTC|null:-1|Ignore unavailable extension: max_fragment_length
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.855 UTC|null:-1|Ignore unavailable extension: status_request
javax.net.ssl|WARNING|10|main|2024-03-28 18:51:38.855 UTC|null:-1|Ignore impact of unsupported extension: ec_point_formats
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.855 UTC|null:-1|Ignore unavailable extension: application_layer_protocol_negotiation
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.855 UTC|null:-1|Ignore unavailable extension: status_request_v2
javax.net.ssl|WARNING|10|main|2024-03-28 18:51:38.855 UTC|null:-1|Ignore impact of unsupported extension: extended_master_secret
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.855 UTC|null:-1|Ignore unavailable extension: session_ticket
javax.net.ssl|WARNING|10|main|2024-03-28 18:51:38.855 UTC|null:-1|Ignore impact of unsupported extension: renegotiation_info
Setting cipherIn position to 100 (limit is 5084)
SSL engine result is Status = OK HandshakeStatus = NEED_TASK
bytesConsumed = 4984 bytesProduced = 0 after unwrapping
Running delegated task
javax.net.ssl|DEBUG|10|main|2024-03-28 18:51:38.868 UTC|null:-1|Consuming server Certificate handshake message (
"Certificates": [
  // Omitted for brevity
]
)
Clearing cipherIn because all bytes have been read and unwrapped
SSL engine result is Status = OK HandshakeStatus = NEED_TASK
bytesConsumed = 100 bytesProduced = 0 after unwrapping
Running delegated task
javax.net.ssl|ERROR|10|main|2024-03-28 18:51:38.875 UTC|null:-1|Fatal (UNEXPECTED_MESSAGE): Unexpected handshake message: server_hello (
"throwable" : {
  javax.net.ssl.SSLProtocolException: Unexpected handshake message: server_hello
     at java.base/sun.security.ssl.Alert.createSSLException(Unknown Source)
     at java.base/sun.security.ssl.Alert.createSSLException(Unknown Source)
     at java.base/sun.security.ssl.TransportContext.fatal(Unknown Source)
     at java.base/sun.security.ssl.TransportContext.fatal(Unknown Source)
     at java.base/sun.security.ssl.TransportContext.fatal(Unknown Source)
     at java.base/sun.security.ssl.HandshakeContext.dispatch(Unknown Source)
     at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(Unknown Source)
     at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(Unknown Source)
     at java.base/java.security.AccessController.doPrivileged(Unknown Source)
     at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(Unknown Source)
     at com.rabbitmq.client.impl.nio.SslEngineHelper.runDelegatedTasks(SslEngineHelper.java:85)
     at com.rabbitmq.client.impl.nio.SslEngineHelper.unwrap(SslEngineHelper.java:120)
     at com.rabbitmq.client.impl.nio.SslEngineHelper.doHandshake(SslEngineHelper.java:60)
     at com.rabbitmq.client.impl.nio.SocketChannelFrameHandlerFactory.create(SocketChannelFrameHandlerFactory.java:112)
     at com.rabbitmq.client.impl.recovery.RecoveryAwareAMQConnectionFactory.newConnection(RecoveryAwareAMQConnectionFactory.java:63)
     at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.init(AutorecoveringConnection.java:160)
     at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1227)
     at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1184)
     at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1322)

@sergio91pt sergio91pt added the bug label Apr 2, 2024
@bmleite
Copy link
Contributor

bmleite commented Apr 2, 2024

I believe the cause for the exception is that the SslEngineHelper implementation calls the delegated task twice for the server_hello handshake message. The reason for this double call is due to the changes introduced in #716, in particular these ones.

Affected code:

do {
int positionBeforeUnwrapping = cipherIn.position();
unwrapResult = sslEngine.unwrap(cipherIn, plainIn);
LOGGER.debug("SSL engine result is {} after unwrapping", unwrapResult);
status = unwrapResult.getStatus();
switch (status) {
case OK:
plainIn.clear();
if (unwrapResult.getHandshakeStatus() == NEED_TASK) {
handshakeStatus = runDelegatedTasks(sslEngine);
int newPosition = positionBeforeUnwrapping + unwrapResult.bytesConsumed();
if (newPosition == cipherIn.limit()) {
LOGGER.debug("Clearing cipherIn because all bytes have been read and unwrapped");
cipherIn.clear();
} else {
LOGGER.debug("Setting cipherIn position to {} (limit is {})", newPosition, cipherIn.limit());
cipherIn.position(positionBeforeUnwrapping + unwrapResult.bytesConsumed());
}
} else {
handshakeStatus = unwrapResult.getHandshakeStatus();
}
break;
case BUFFER_OVERFLOW:
throw new SSLException("Buffer overflow during handshake");
case BUFFER_UNDERFLOW:
LOGGER.debug("Buffer underflow");
cipherIn.compact();
LOGGER.debug("Reading from channel...");
read = NioHelper.read(channel, cipherIn);
if(read <= 0) {
retryRead(channel, cipherIn);
}
LOGGER.debug("Done reading from channel...");
cipherIn.flip();
break;
case CLOSED:
sslEngine.closeInbound();
break;
default:
throw new SSLException("Unexpected status from " + unwrapResult);
}
}
while (unwrapResult.getHandshakeStatus() != NEED_WRAP && unwrapResult.getHandshakeStatus() != FINISHED);

The code flow for this particular scenario is as follows:

  • enters the do ... while loop and successfully executes the first sslEngine.unwrap(cipherIn, plainIn)

  • calls runDelegatedTasks(sslEngine) once without any errors

  • enters the first if condition (newPosition == cipherIn.limit()), which calls cipherIn.clear()
    ⚠️ I think this is where the problem is, because, as per the Java documentation, the clear() does not erase any data, it just resets the position, limit, and mark.

  • the code continues in the do ... while loop since unwrapResult.getHandshakeStatus() is still NEED_TASK

  • calls the sslEngine.unwrap(cipherIn, plainIn) a second time, however, since the cipherIn position was reset, this will result in unwrapping the same server_hello handshake message

  • calls runDelegatedTasks(sslEngine) a second time for the same server_hello message, resulting in the SSLProtocolException

I'm not familiar with all the TLS handshake flows and details but, from a mere ByteArray perspective, I don't think we need the if (newPosition == cipherIn.limit()) condition. I've tested using an instrumented version of the SslEngineHelper with the following changes and it worked:

if (unwrapResult.getHandshakeStatus() == NEED_TASK) {
    handshakeStatus = runDelegatedTasks(sslEngine);
    // removed the IF...ELSE condition and now always updates the cipherIn position
    cipherIn.position(positionBeforeUnwrapping + unwrapResult.bytesConsumed());
} else {
    handshakeStatus = unwrapResult.getHandshakeStatus();
}

Please check if this analysis makes sense. 🙏

@michaelklishin
Copy link
Member

@bmleite it does make sense. Please submit a PR and we will test it some more. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants