Closed
Description
Elasticsearch version (bin/elasticsearch --version
): master(see below for details)
Plugins installed: none
JVM version (java -version
): OpenJDK Runtime Environment 18.9 (build 11.0.2+7)
OS version (uname -a
if on a Unix-like system): Linux 4.13.0-45-generic
Description of the problem including expected versus actual behavior:
When running a bulk-benchmark with three nodes we get errors like these since Friday March 27:
HTTP status: 500, message: Invalid vInt ((fffffffd & 0x7f) << 28) | fffffff'
HTTP status: 500, message: Invalid string; unexpected character: 255 hex: ff
HTTP status: 500, message: tried to read: 264241151 bytes but only 130117 remaining
HTTP status: 500, message: Failed to deserialize response from handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler]
Here is the stack trace from one of these errors:
[2020-03-28T01:37:43,598][WARN ][o.e.c.r.a.AllocationService] [rally-node-0] failing shard [failed shard, shard [osmgeopoints][3], node[177SLf7WTDCnrmB--Ryr2A], [R], s[STARTED], a[id=scJR6Wm3TmyemaIAAAnRcA], message [failed to perform indices:data/write/bulk[s] on replica [
osmgeopoints][3], node[177SLf7WTDCnrmB--Ryr2A], [R], s[STARTED], a[id=scJR6Wm3TmyemaIAAAnRcA]], markAsStale [true], failure [org.elasticsearch.transport.RemoteTransportException: [rally-node-1][192.168.20.32:39300][indices:data/write/bulk[s][r]]
Caused by: java.io.IOException: Invalid string; unexpected character: 146 hex: 92
at org.elasticsearch.common.io.stream.StreamInput.throwOnBrokenChar(StreamInput.java:543)
at org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:502)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ConcreteShardRequest.<init>(TransportReplicationAction.java:1044)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ConcreteReplicaRequest.<init>(TransportReplicationAction.java:1112)
at org.elasticsearch.action.support.replication.TransportReplicationAction.lambda$new$1(TransportReplicationAction.java:146)
at org.elasticsearch.transport.RequestHandlerRegistry.newRequest(RequestHandlerRegistry.java:56)
at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:185)
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:129)
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:106)
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:672)
at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:136)
at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:110)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:68)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:227)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:834)
]]
org.elasticsearch.transport.RemoteTransportException: [rally-node-1][192.168.20.32:39300][indices:data/write/bulk[s][r]]
Caused by: java.io.IOException: Invalid string; unexpected character: 146 hex: 92
at org.elasticsearch.common.io.stream.StreamInput.throwOnBrokenChar(StreamInput.java:543) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:502) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ConcreteShardRequest.<init>(TransportReplicationAction.java:1044) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ConcreteReplicaRequest.<init>(TransportReplicationAction.java:1112) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.replication.TransportReplicationAction.lambda$new$1(TransportReplicationAction.java:146) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.transport.RequestHandlerRegistry.newRequest(RequestHandlerRegistry.java:56) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:185) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:129) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:106) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:672) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:136) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:110) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:68) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) ~[?:?]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:227) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) ~[?:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[?:?]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
After some bisecting I could narrow it down to #48263.
Steps to reproduce:
Run a benchmark with Rally against a three node cluster with the affected commit 42150d3 and observe the failure after a while:
esrally --target-host="HOST_IP_A:39200,HOST_IP_B:39200,HOST_IP_C:39200" --track="geopoint" --challenge="append-no-conflicts-index-only" --car="4gheap" --client-options="timeout:240" --runtime-jdk="11" --track-params="{\"number_of_replicas\": 1}" --pipeline="from-sources-complete" --revision="42150d3674089f415e0b8c54993aef2a03d016b5" --on-error=abort
Notes:
- you need to set proper IPs for
HOST_IP_A
,HOST_IP_B
andHOST_IP_C
- we did not observe the issue with single-node clusters