Skip to content

[Python aio] BlockingIOError from PollerCompletionQueue._handle_events under sustained traffic with multi-loop processes #42357

@anishnath

Description

@anishnath

What version of gRPC and what language are you using?

  • grpcio: 1.76.0
  • grpcio-tools: 1.76.0
  • Language: Python

What operating system (Linux, Windows,...) and version?

  • macOS 26 (Darwin 25.2.0, arm64) — errno 35 (EAGAIN/EWOULDBLOCK)
  • Linux (Ubuntu 22.04, x86_64 c8i.2xlarge) — errno 11

What runtime / compiler are you using (e.g. python version or version of gcc)

  • Python 3.11.13
  • uvloop 0.22.1 (also reproduces on the default asyncio _UnixSelectorEventLoop)
  • Granian 2.2.5 ASGI server

What did you do?

Used grpc.aio as the inference-client transport in a production
bid-agent service. Topology per process:

  1. Granian worker loop (uvloop) handles incoming HTTP /quote
    requests. Each request fires 1-2 unary Predict RPCs plus a
    Predict for win-price prediction via a grpc.aio channel pool.
  2. Background asyncio loop (default _UnixSelectorEventLoop in a
    separate thread) handles model loading, which also makes
    LoadModel RPCs through the same channel-manager singleton.

Channel options:

grpc.aio.insecure_channel(
    endpoint,
    options=[
        ('grpc.keepalive_time_ms', 10000),
        ('grpc.keepalive_timeout_ms', 5000),
        ('grpc.keepalive_permit_without_calls', True),
        ('grpc.http2.max_pings_without_data', 0),
        ('grpc.max_receive_message_length', 50 * 1024 * 1024),
        ('grpc.max_send_message_length', 50 * 1024 * 1024),
    ],
)

What did you expect to see?

grpc.aio to handle FD-readiness notifications cleanly — either
finding events to drain, or quietly continuing when the FD has no
pending events.

What did you see instead?

BlockingIOError raised from the Cython completion-queue poller and
propagated to asyncio's exception handler, logged at ERROR level
multiple times per second under load:

ERROR:asyncio:Exception in callback functools.partial(
    <bound method PollerCompletionQueue._handle_events of
     <grpc._cython.cygrpc.PollerCompletionQueue object at 0x...>>,
    <uvloop.Loop running=True closed=False debug=False>)
handle: <Handle functools.partial(...)>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 148,
       in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 35] Resource temporarily unavailable

Identical trace fires on the default _UnixSelectorEventLoop:

File "/opt/.../python3.11/asyncio/events.py", line 84, in _run
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 148,
     in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 35] Resource temporarily unavailable

So the bug is not loop-specific it's grpcio's own C-extension poller.

Anything else we should know about your project / environment?

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions