Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-121313: multiprocessing: change connection buffer size to 64KiB #123559

Merged
merged 9 commits into from
Sep 3, 2024

Conversation

methane
Copy link
Member

@methane methane commented Sep 1, 2024

Windows:

Current buffer size is 8KiB since multiprocessing is introduced.
It seems small for recent Python usages.

e711caf#diff-2c54a007d7fe1d9ac5ca008fe2d054394c39a4f521eea2cb580101a284d7b7ecR28

macOS/BSD:

They use 64KiB buffer for pipes. Current 16 pages (256KiB) buffer makes ~10% slowdown compared to 64KiB on M1 mac.

Linux:

I don't have 16k/64k page Linux. But when I change the pipe buffer size via fcntl, 256KiB buffer doesn't make notable performance benefit.

64KiB seems good default buffer size.
If it is not suitable, user can try other size by changing multiprocessing.connection.BUFSIZE.

@methane
Copy link
Member Author

methane commented Sep 1, 2024

Previous discussion:
#121315 (comment)

quick bench code:
https://gist.github.com/methane/a6cb799704a11f2bb2f64b16c0b830cc

64k vs 256k quick bench:

$ time python hello.py # 64k

real	0m25.999s
user	0m9.753s
sys	0m29.687s

real	0m25.964s
user	0m9.553s
sys	0m29.854s

$ time python hello.py # 256k

real	0m25.927s
user	0m9.421s
sys	0m29.958s

real	0m25.933s
user	0m9.330s
sys	0m30.108s

How to change the pipe buffer size:

diff --git a/Lib/multiprocessing/connection.py b/Lib/multiprocessing/connection.py
index ede5596e4fc..9c1bfb93728 100644
--- a/Lib/multiprocessing/connection.py
+++ b/Lib/multiprocessing/connection.py
@@ -544,7 +544,10 @@ def Pipe(duplex=True):
             c1 = Connection(s1.detach())
             c2 = Connection(s2.detach())
         else:
+            import fcntl
             fd1, fd2 = os.pipe()
+            fcntl.fcntl(fd1, fcntl.F_SETPIPE_SZ, 256*1024)
+            fcntl.fcntl(fd2, fcntl.F_SETPIPE_SZ, 256*1024)
             c1 = Connection(fd1, writable=False)
             c2 = Connection(fd2, readable=False)

I confirmed that this setting works by checking n = len(chunk) size.

So I don't think having Linux pipe only optimization is worth enough.
Static 64KiB size seems good enough.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really small nitpick. But otherwise LGTM.

methane and others added 3 commits September 2, 2024 19:28
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
methane and others added 2 commits September 2, 2024 20:32
Co-authored-by: Victor Stinner <vstinner@python.org>
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@methane
Copy link
Member Author

methane commented Sep 3, 2024

I tested this PR on Windows but no speedup. It is because Windows can read whole remaining data at once via this function.

def _get_more_data(self, ov, maxsize):

@gpshead gpshead merged commit 13f61bf into python:main Sep 3, 2024
36 checks passed
@methane methane deleted the mp-bufsize branch January 15, 2025 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants