Skip to content

[🐛 Bug]: Selenium Grid, request stuck in retry loop #10930

@eete22

Description

@eete22

What happened?

Everything was working fine with Selenium Grid 3, but when we switched over to Selenium Grid 4 we started experiencing steps in our test cases sometimes taking ~18 minutes longer to complete. Looking at the logs, it seems that a request gets stuck in a retry loop. We have observed it for several different types of requests (e.g. /execute/sync, /frame/, /se/log).

We are running Selenium Grid with Docker/Podman using the following compose file:

hub:
    image: selenium/hub:4.2.0-20220527
    container_name: hub
    ports:
      - "4442:4442"
      - "4443:4443"
      - "4444:4444"
    restart: always
    environment:
      - SE_SESSION_REQUEST_TIMEOUT=2400
      
  node:
    image: selenium/node-chrome:4.2.0-20220527
    shm_size: 2gb
    container_name: node
    depends_on:
      - hub
    restart: always
    environment:
      - SE_EVENT_BUS_HOST=hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_NODE_SESSION_TIMEOUT=300
      - SE_NODE_OVERRIDE_MAX_SESSIONS=true
      - SE_NODE_MAX_SESSIONS=4
      - SCREEN_WIDTH=1920
      - SCREEN_HEIGHT=1080
      - SCREEN_DEPTH=24
    volumes:
      - /dev/shm:/dev/shm
    ports:
      - "9002:5900"
    links:
      - hub

I'll try to explain the scenario we have observed in the logs, starting from the request before the one that gets stuck.

1. Hub: Sends previous request.

13:36:23.034 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x7d412036, L:/10.88.0.31:51890 - R:/10.88.0.31:5555] for GET '/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'

2. Node: Previous request gets a channel.

13:36:23.036 DEBUG [NettyRequestSender.pollPooledChannel] - Using pooled Channel '[id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]' for 'GET' to 'http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'
13:36:23.036 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380] for GET '/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'

3. Node: Previous request is sent and receives response.

13:36:23.043 DEBUG [HttpHandler.handleHttpResponse] -
Request DefaultFullHttpRequest(decodeResult: success, version: HTTP/1.1, content: EmptyByteBufBE)
GET /session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled HTTP/1.1
Accept: application/json, image/png
User-Agent: selenium/4.0.0 (.net linux)
Connection: keep-alive
Cache-Control: no-cache
traceparent: 00-f1d0e00c8e477eda9572995e0b7da776-629612b9dedff6a5-01
host: localhost:60380
Content-Type: application/json; charset=utf-8

  Response DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
  HTTP/1.1 200 OK
  Content-Type: application/json; charset=utf-8
  cache-control: no-cache
  content-length: 14
4. Node: Channel for previous request is offered back to pool.

13:36:23.044 DEBUG [ChannelManager.tryToOfferChannelToPool] - Adding key: http://localhost:60380 for channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]

5. Hub: Sends new request.

13:36:23.055 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x7d412036, L:/10.88.0.31:51890 - R:/10.88.0.31:5555] for POST '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'

6. Node: New request is processed, but it freezes after "injecting" log (where is normally would say "Using __ channel").

13:36:23.059 DEBUG [HttpTracing.inject] - Injecting (POST) /session/20135a43e0062e63b125ccdf0a75ab23/execute/sync into OpenTelemetrySpan{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da,spanId=31fc6d6d7e2fbf62} at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler:84

7. Node: About a minuter later the idleChannelDetector finds that the channel used by the previous request is idle, and closes it.

13:37:23.557 DEBUG [DefaultChannelPool$IdleChannelDetector.expiredChannels] - Adding Candidate expired Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380] isIdleTimeoutExpired=true isRemotelyClosed=false isTtlExpired=false
13:37:23.557 DEBUG [DefaultChannelPool$IdleChannelDetector.closeChannels] - Closing Idle Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]

8. Hub: About two minuter after that, the hub gets server error and retries (180 s default timeout).

13:39:23.076 DEBUG [RetryRequest.lambda$static$7] - Failure due to server error #1. Retrying.
13:39:23.082 DEBUG [NettyConnectListener.writeRequest] - Using new Channel '[id: 0xc39d1690, L:/10.88.0.31:52094 - R:/10.88.0.31:5555]' for 'POST' to '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'

9. Node: the frozen request starts running and gets a channel, then gets stuck again.

13:39:23.087 DEBUG [NettyConnectListener.writeRequest] - Using new Channel '[id: 0x3d99f31f, L:/127.0.0.1:39510 - R:localhost/127.0.0.1:60380]' for 'POST' to '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'

10. Node: The new retry request gets frozen after the "injecting" log.

13:39:23.089 DEBUG [HttpTracing.inject] - Injecting (POST) /session/20135a43e0062e63b125ccdf0a75ab23/execute/sync into OpenTelemetrySpan{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da,spanId=656ae3b1f9818e50} at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler:84

11. Five minutes after the call in step (9) the node gets a timeout exception for the first frozen request.
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
	at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
	at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
	at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
	at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
	at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
	at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
	... 32 more
13:44:23.082 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da, spanId=31fc6d6d7e2fbf62, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da, spanId=6ee9490e9202f7ef, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=https://opentelemetry.io/schemas/1.9.0, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.14.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null}, name=node.forward_command, kind=INTERNAL, startEpochNanos=1655904983059769000, endEpochNanos=1655905463082764947, attributes=AttributesMap{data={http.method=POST, random.key=e65233dd-8b7d-4030-89f9-c26c7e17877e, span.kind=server, http.target=/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync, error=true}, capacity=128, totalAddedValues=5}, totalAttributeCount=5, events=[ImmutableEventData{name=exception, attributes={exception.message="Unable to execute request: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown", exception.stacktrace="org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
	at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
	at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
	at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
	at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
	at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
	at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
	... 32 more
", exception.type="org.openqa.selenium.TimeoutException", http.flavor=1, http.handler_class="org.openqa.selenium.remote.http.Route$PredicatedRoute", http.host="hub:4444", http.method="POST", http.request_content_length="99", http.scheme="HTTP", http.target="/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync", http.user_agent="selenium/4.0.0 (.net linux)"}, epochNanos=1655905463082460855, totalAttributeCount=11}], totalRecordedEvents=1, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=ERROR, description=Kind: UNKNOWN Description:}, hasEnded=true}
13:44:23.082 WARN [SeleniumSpanExporter$1.lambda$export$1] - Unable to execute request: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
13:44:23.082 WARN [SeleniumSpanExporter$1.lambda$export$1] - org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
	at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
	at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
	at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
	at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
	at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
	at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
	... 32 more

13:44:23.083 WARN [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "a780c6e2ef7b2b647a0c64e1a3a0a9da","eventTime": 1655905463082460855,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: java.util.concurrent.TimeoutException\nBuild info: version: '4.2.0', revision: '86eb611648'\nSystem info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\nBuild info: version: '4.2.0', revision: '86eb611648'\nSystem info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)\n\tat org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)\n\tat org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)\n\tat org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.node.Node.execute(Node.java:240)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)\n\t... 32 more\n","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.remote.http.Route$PredicatedRoute","http.host": "hub:4444","http.method": "POST","http.request_content_length": "99","http.scheme": "HTTP","http.target": "\u002fsession\u002f20135a43e0062e63b125ccdf0a75ab23\u002fexecute\u002fsync","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}

13:44:23.157 DEBUG [TimeoutTimerTask.expire] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms for NettyResponseFuture{currentRetry=0,
	isDone=0,
	isCancelled=0,
	asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@144965c7,
	nettyRequest=org.asynchttpclient.netty.request.NettyRequest@4595be47,
	future=java.util.concurrent.CompletableFuture@43b022a9[Not completed],
	uri=http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync,
	keepAlive=true,
	redirectCount=0,
	timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@4f3f3e4a,
	inAuth=0,
	touch=1655905163087} after 300079 ms
13:44:23.157 DEBUG [ChannelManager.closeChannel] - Closing Channel [id: 0x3d99f31f, L:/127.0.0.1:39510 - R:localhost/127.0.0.1:60380] 
13:44:23.158 DEBUG [NettyRequestSender.abort] - Aborting Future NettyResponseFuture{currentRetry=0,
	isDone=0,
	isCancelled=0,
	asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@144965c7,
	nettyRequest=org.asynchttpclient.netty.request.NettyRequest@4595be47,
	future=java.util.concurrent.CompletableFuture@43b022a9[Not completed],
	uri=http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync,
	keepAlive=true,
	redirectCount=0,
	timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@4f3f3e4a,
	inAuth=0,
	touch=1655905163087}

13:44:23.158 DEBUG [NettyRequestSender.abort] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms
java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:60380 after 300000 ms
	at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
	at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
	at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
13:44:23.158 DEBUG [AsyncCompletionHandler.onThrowable] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms
java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:60380 after 300000 ms
	at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
	at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
	at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
  1. And it continues like this. The hub retrying every 3 minutes, the request getting stuck in the node after "injecting" log and being released when a new retry request is made. The requests that are started in the node get a timeout exceptions after 5 minutes. Meanwhile other requests for other sessions are running in parallel.

  2. After the hub has retried 5 times, the client starts making other calls to the hub with the same session. After that everything seems to work again.

How can we reproduce the issue?

-

Relevant log output

Logs in description above.

Operating System

Docker

Selenium version

C# 4.2.0

What are the browser(s) and version(s) where you see this issue?

selenium/node-chrome:4.2.0-20220527 but also earlier versions of Selenium Grid 4 image

What are the browser driver(s) and version(s) where you see this issue?

selenium/node-chrome:4.2.0-20220527 but also earlier versions of Selenium Grid 4 image

Are you using Selenium Grid?

4.2.0 (but issue also occurred with earlier versions of Selenium Grid 4)

Metadata

Metadata

Assignees

No one assigned

    Labels

    B-gridEverything grid and server relatedI-defectSomething is not working as intendedJ-awaiting answerQuestion asked of user; a reply moves it to triage again

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions