Description
Feature or enhancement
Proposal:
Code reading data in pure python tends to make a buffer variable, call os.read()
which returns a separate newly allocated buffer of data, then copy/append that data onto the pre-allocated buffer[0]. That creates unnecessary extra buffer objects, as well as unnecessary copies. Provide os.readinto
for directly filling a Buffer Protocol object.
os.readinto
should closely mirror _Py_read
which underlies os.read in order to get the same behaviors around retries as well as well-tested cross-platform support.
Move simple cases that use os.read (ex. [0]) to use the new API when it makes code simpler and more efficient. Potentially adding readinto
to more readable/writeable file-like proxy objects or objects which transform the data (ex. Lib/_compression
) is out of scope for this issue.
[0]
Lines 1914 to 1921 in 298dda5
cpython/Lib/multiprocessing/forkserver.py
Lines 384 to 392 in 298dda5
Lines 1695 to 1701 in 298dda5
os.read
loops to migrate
Well contained os.read
loops
-
multiprocessing.forkserver read_signed
- @cmaloney - gh-129205: Update multiprocessing.forkserver to use os.readinto #129425 [x]subprocess Popen._execute_child
- @cmaloney - gh-129205: Use os.readinto() in subprocess errpipe_read #129498
os.read
loop interleaved with other code
-
_pyio FileIO.read FileIO.readall FileIO.readinto
see, Reduce copies when reading files in pyio, match behavior of _io #129005 -- @cmaloney -
_pyrepl.unix_console UnixConsole.input_buffer
-- fixed length underlying buffer with "pos" / window on top. -
pty _copy
. Operates around a "high waterlevel" / attempt to have a fixed-ish size buffer. Wrapsos.read
with a_read
function. -
subprocess Popen.communicate
. Note, this feels like something non-contiguous Py_buffer would be really good for, particularly inself.text_mode
where currently all the bytes are "copied" into a contiguousbytes
to turn then turn into text... -
tarfile _Stream._read and _Stream.__read
. Note, builds _LowLevelFile aroundos.read
, but other read methods also available.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
Linked PRs
- gh-129205: Add os.readinto API for reading data into a caller provided buffer #129211
- gh-129205: Modernize test_eintr #129316
- gh-129205: Update multiprocessing.forkserver to use os.readinto #129425
- gh-129205: Use os.readinto() in subprocess errpipe_read #129498
- gh-129205: Experiment BytesIO._readfrom() #130098