Skip to content

Transport messages exceeding 2GiB are not handled gracefully #94137

Open

Description

Seen in a Cloud cluster (with related forum thread):

[instance-0000000016] exception caught on transport layer [Netty4TcpChannel{localAddress=/172.17.0.7:19547, remoteAddress=/10.47.48.237:37520, profile=default}], closing connection
java.lang.IllegalStateException: Pipeline state corrupted by uncaught exception
	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:83) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:63) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1372) ~[?:?]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1235) ~[?:?]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1284) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[?:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.lang.IllegalArgumentException: CompositeBytesReference cannot hold more than 2GB
	at org.elasticsearch.common.bytes.CompositeBytesReference.ofMultiple(CompositeBytesReference.java:60) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.common.bytes.CompositeBytesReference.of(CompositeBytesReference.java:41) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.transport.InboundAggregator.finishAggregation(InboundAggregator.java:104) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:147) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:121) ~[elasticsearch-8.4.3.jar:?]
	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:86) ~[elasticsearch-8.4.3.jar:?]
	... 33 more

I expect you can get this to happen with e.g. an ingest pipeline that expands each document's size by an order of magnitude or so, combined with transport-level compression. I am guessing that's what the user in question was doing here. Maybe we can split up the shard bulk in this case, but in general we should just refuse to send such a large message rather than allowing the receiver to explode like this.

Relates #75334 too, we also ought to be able to at least skip over such oversized messages without tearing down the connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions