-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test.test_asyncio.test_server.TestServer2.test_abort_clients consistently fails on Linux 6.10.x #122136
Comments
I wanted to cc @CendioOssman in the report but forgot. |
What the test tries to do is to fill the kernel socket buffers so that What we do to achieve this is set an explicit send and receive buffer size, to force the kernel to stop dynamically resizing them. We then ask the kernel what the real size is (since it will do funky stuff like double the requested size), and then write that much data in the buffer. The No idea why this isn't working in Fedora's CI. And it's difficult to debug if it only happens there. :/ Perhaps you do some creative fiddling with the network settings that forces dynamic buffer scaling to stay on? Or is there some virtualization/containerization that results in the system lying to us? |
To debug, could you add some output of |
Thank you! The system is 100% virtualized/containerized. It runs on the Tetsing Farm and I don't know much about it, but perhaps @thrix might know what is happening or how to easily reproduce this locally. |
Is this C? My Python has no socket.getsockopt, even thou it is documented at https://docs.python.org/3/library/socket.html#socket.socket.getsockopt |
Oh, I have getsockopt on socket objects, but it takes 2 arguments (level and option). |
So I can do something like his and run it on the CI (outputs from my machine): >>> import socket
>>> s = socket.socket()
>>> s.getsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF)
16384
>>> s.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF)
131072 But for ioctl, I am out of my depth. I've tried something like |
We seem to have exactly the same here when packaging 3.13.0rc1 on openSUSE/Tumbleweed, |
Unfortunate but it's clearly flaky and opensuse+fedora are struggling with the same test. I'm watching the upstream bug. Bug: python/cpython#122136 Closes: https://bugs.gentoo.org/936314 Signed-off-by: Sam James <sam@gentoo.org>
I'm also seeing this sometimes when testing on Gentoo amd64, in systemd-nspawn container. However, the test is only flaky here — generally it fails when the system is busy, but passes when Python's test suite has all the CPU to itself. |
https://build.opensuse.org/request/show/1192376 by user mcepl + dimstar_suse - Add CVE-2024-6923-email-hdr-inject.patch to prevent email header injection due to unquoted newlines (bsc#1228780, CVE-2024-6923). - Adding bso1227999-reproducible-builds.patch fixing bsc#1227999 adding reproducibility patches from gh#python/cpython!121872 and gh#python/cpython!121883. - Add skip_test_abort_clients.patch (gh#python/cpython#122136) skip not yet fixed failing test - %{profileopt} variable is set according to the variable %{do_profiling} (bsc#1227999) - Update bluez-devel-vendor.tar.xz - Update to 3.13.0~rc1: - Tests - gh-59022: Add tests for pkgutil.extend_path(). Patch by Andreas Stocker. - gh-99242: os.getloadavg() may throw OSError when running regression tests under certain conditions (e.g. chroot). This error is now caught and ig
This now started failing in the Fedora build system as well. |
Presumably when our builders were kernel-updated to 6.10.4 and/or 6.10.5. |
Indeed. I just rebooted from kernel 6.8.9 to 6.10.6 and I can reproduce the failure locally on Fedora Linux 39:
(I could not reproduce the failure with Linux 6.8.9.) |
From
So, the buffer size reported by Testing with the Python import os
import socket
print(os.uname().sysname, os.uname().release)
HOST = socket.gethostname()
PORT = 12345
l_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
l_sock.bind((HOST, PORT))
l_sock.listen(5)
c_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
c_sock.connect((HOST, PORT))
s_sock, c_addr = l_sock.accept()
BUFSIZE_REQUEST = 65536
s_sock.setblocking(False)
c_sock.setblocking(False)
s_sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, BUFSIZE_REQUEST)
c_sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, BUFSIZE_REQUEST)
bufsize_s = s_sock.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF)
bufsize_c = c_sock.getsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF)
send_size = min(bufsize_s, bufsize_c)
print(f'{BUFSIZE_REQUEST=} {bufsize_s=} {bufsize_c=}')
for i in range(50):
try:
sent = c_sock.send(b'a' * send_size)
except BlockingIOError:
sent = None
print(f'iteration {i}: {sent=}/{send_size} bytes')
if sent is None:
break
c_sock.close()
s_sock.close()
l_sock.close() This has different behaviour between the kernel versions.
But with 6.10.6, the buffers can hold almost five times the reported buffer size:
The new kernel is being quite generous! I guess the way to reliably fill the buffers is to keep sending until the kernel doesn't accept any more data, rather than rely on the numbers... |
…data than advertised
…data than advertised (pythonGH-123423) (cherry picked from commit b379f1b) Co-authored-by: Petr Viktorin <encukou@gmail.com>
Now the buildbots are fixed, but I'd like to try to find some more info about what's happening so I'll keep the issue open. |
Actually, I see suddenly all openSUSE builds on PPC64LE failing because of Complete build log of Python 3.13.0rc1 Or perhaps it is another revelation of #85848 (aka #110325)? |
|
Python 3.12 is not affected: the test was added to Python 3.13. |
https://build.opensuse.org/request/show/1197482 by user mcepl + dimstar_suse - Add gh122136-test_asyncio-kernel-buffer-data.patch fixing gh#python/cpython#122136 (changes in kernel provide different amount of data in the socket buffers). - Remove skip_test_abort_clients.patch, which is not needed any more. - Add CVE-2024-8088-inf-loop-zipfile_Path.patch to prevent malformed payload to cause infinite loops in zipfile.Path (bsc#1229704, CVE-2024-8088).
Bug report
Bug description:
Hello, we run the testsuite of the optimized and debug builds of Python in Fedora CI. Since the addition in 4159644 the test has constantly failed like this on Fedora Rawhide / Fedora Linux 41 (the development version). It passes on Fedora 40 and 39.
I was unable to reproduce this outside of Fedora CI. Perhaps this has to do with how the network is configured, no idea.
This is the output of
python3.13 -m test.pythoninfo
:We invoke the installed tests like this:
I'd like to debug this and see if something is wrong with the test or perhaps in Fedora 41. But I don't know where to start.
CPython versions tested on:
3.13
Operating systems tested on:
Linux
Linked PRs
The text was updated successfully, but these errors were encountered: