Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KeepAliveClientRequest class for k8s async client #15220

Merged
merged 5 commits into from
Sep 4, 2024

Conversation

kevingrismore
Copy link
Contributor

@kevingrismore kevingrismore commented Sep 4, 2024

Closes #14954

Swaps the request class on the async k8s base client for one that has TCP keepalive enabled after the client is built. Refer to aio-libs/aiohttp#3904 (comment)

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json.
  • If this pull request adds functions or classes, it includes helpful docstrings.

@kevingrismore kevingrismore marked this pull request as ready for review September 4, 2024 20:14
@github-actions github-actions bot added the bug Something isn't working label Sep 4, 2024
@kevingrismore
Copy link
Contributor Author

To be clear about why I opened this, client.rest_client.pool_manager.connection_pool_kw in the previous implementation is accessing a property of a urllib3 connection pool. aiohttp has a different object structure and we need to handle the keepalive configuration slightly differently.

Copy link
Collaborator

@zzstoatzz zzstoatzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧼

@kevingrismore
Copy link
Contributor Author

Before: crash with error matching reports when run in a cluster with a 4 minute TCP load balancing timeout rule
image

Error occurred while streaming logs - Job will continue to run but logs will no longer be streamed to stdout.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiohttp/client_proto.py", line 94, in connection_lost
    uncompleted = self._parser.feed_eof()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "aiohttp/_http_parser.pyx", line 513, in aiohttp._http_parser.HttpParser.feed_eof
aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect_kubernetes/worker.py", line 939, in _stream_job_logs
    async for line in logs.content:
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 50, in __anext__
    rv = await self.read_func()
         ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 317, in readline
    return await self.readuntil()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 351, in readuntil
    await self._wait("readuntil")
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 312, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>. ConnectionResetError(104, 'Connection reset by peer')

After: no crash, flow completes successfully
image

@kevingrismore kevingrismore merged commit 308f461 into main Sep 4, 2024
18 checks passed
@kevingrismore kevingrismore deleted the k8s-async-keepalive branch September 4, 2024 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

K8s worker fails monitoring flow and sets it to crashed.
2 participants