Skip to content

pontential netty io blocked in pu model or triple alone #13041

@iJIAJIA

Description

@iJIAJIA
  • I have searched the issues of this repository and believe that this is not a duplicate.

Environment

  • Dubbo version: 3.2.3
  • Operating System version: win10
  • Java version: 1.8

Steps to reproduce this issue

复现步骤

  1. 采用直连方式(比较好复现,假如实例往注册中心注册一个隔离网段ip, 也是同样的效果), 配置一个不存在的内网网段. 本地的cpu核心一般较多, 可以声明多个来加大复现概率
@DubboReference(check=false, url="tri://192.168.1.1:1234")
private FooService fooService;

@DubboReference(check=false, url="tri://192.168.1.1:1233")
private Foo2Service fooService;

@DubboReference(check=false, url="tri://${正常实例的访问节点}:1233", timeout="2000")
private HealthService healthService;
  1. 对一个正常的服务进行triple请求.

Pls. provide [GitHub address] to reproduce this issue.

Expected Behavior

接口正常响应.

Actual Behavior

业务研发同学反馈说测试环境不稳定, 有概率性出现调用超时, 且都是在使用triple协议时出现.
查看skywalking链路追踪, 发现超时原因基本都是客户端等待响应超时, 且请求都在超时后一段时间发出.
客户端配置的超时时间为2s

原因

TripleProtocol使用的NettyConnectionClient, 问题代码在

@Override
    protected void doConnect() throws RemotingException {
        ....
        createConnectingPromise();
        final ChannelFuture promise = bootstrap.connect();
        // 这里会添加 org.apache.dubbo.remoting.transport.netty4.NettyConnectionClient.ConnectionListener
        promise.addListener(this.connectionListener);
        // 阻塞等待指定的超时时间(默认3s)
        boolean ret = connectingPromise.get().awaitUninterruptibly(getConnectTimeout(), TimeUnit.MILLISECONDS);
        ....
    }
class ConnectionListener implements ChannelFutureListener {

        @Override
        public void operationComplete(ChannelFuture future) {
            ....
            // 失败重试时, 拿的是netty的io线程. 这里假如远端的服务没有响应, 会导致netty的io线程阻塞最多connectionTimeout的时间
            final EventLoop loop = future.channel().eventLoop();
            loop.schedule(() -> {
                try {
                    connectionClient.doConnect();
                } catch (RemotingException e) {
                    LOGGER.error(TRANSPORT_FAILED_RECONNECT, "", "",  "Failed to connect to server: " + getConnectAddress());
                }
            }, 1L, TimeUnit.SECONDS);
        }
    }

默认的dubbo协议为什么不会?
dubbo使用的NettyClient. 里面的重连使用的org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask

Metadata

Metadata

Assignees

No one assigned

    Labels

    component/sdkRelated with apache/dubbohelp wantedEverything needs help from contributorstype/bugBugs to being fixed

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions