-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add os.readinto API for reading data into a caller provided buffer #129205
Comments
Do you want to work on a PR? If not, I can if you prefer. |
Almost done with one for adding |
Add a new OS api which will read data directly into a caller provided writeable buffer protocol object.
Just curious, how would you rewrite your first example using |
@bluetech Ideally to me they'd move to Migrating cases like that should happen, and I suspect whats the simplest/cleanest will evolve with code review. My prototype has looked something like: errpipe_data = bytearray(50_000)
bytes_read = 0
while bytes_read < 50_000:
count := os.readinto(errpipe_read, memoryview(errpipe_data)[bytes_read:]):
if count == 0:
break
bytes_read += count
del errpipe_data[bytes_read:] # Remove excess bytes Are some behavior differences between that and the code as implemented today (Today after reading 49_999 bytes, could get 50_000 bytes resulting in 99_999 bytes in Not sure if it's good / needed to include the I also like doing the same thing but with a |
Side note, it does look like the original code is meant to be |
…ded buffer (#129211) Add a new OS API which will read data directly into a caller provided writeable buffer protocol object. Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Victor Stinner <[email protected]>
* Use f-string * Fix grammar: replace 'datas' with 'data' (and replace 'data' with 'item'). * Remove unused variables: 'pid' and 'old_mask'.
* Use f-string. * Fix grammar: replace 'datas' with 'data' (and replace 'data' with 'item'). * Remove unused variables: 'pid' and 'old_mask'.
|
* Use f-string. * Fix grammar: replace 'datas' with 'data' (and replace 'data' with 'item'). * Remove unused variables: 'pid' and 'old_mask'. * Factorize test_read() and test_readinto() code. Co-authored-by: Cody Maloney <[email protected]>
Read into a pre-allocated fixed size buffer. The previous code the buffer could actually get to 100_000 bytes in two reads (first read returns 50_000, second pass through loop gets another 50_000), so this does change behavior. I think the fixed length of 50_000 was the intention though. This is used to pass exception issues that happen during _fork_exec from the child to parent process.
The subprocess readinto an existing bytearray has a caveat: instead of a malloc, it does a malloc + bzero as pre-sized bytearrays don't have an undefined value concept. so it goes beyond just mapping pages for a buffer to writing all pages, no matter how much data is actually being read. when you're rarely/never going to be reading a lot of data or reusing the buffer, that is unneeded extra work. |
@gpshead I commented on the closed PR, neither Will definitely take as a general note not to migrate loops unless there's a measurable / significant performance change. |
Feature or enhancement
Proposal:
Code reading data in pure python tends to make a buffer variable, call
os.read()
which returns a separate newly allocated buffer of data, then copy/append that data onto the pre-allocated buffer[0]. That creates unnecessary extra buffer objects, as well as unnecessary copies. Provideos.readinto
for directly filling a Buffer Protocol object.os.readinto
should closely mirror_Py_read
which underlies os.read in order to get the same behaviors around retries as well as well-tested cross-platform support.Move simple cases that use os.read (ex. [0]) to use the new API when it makes code simpler and more efficient. Potentially adding
readinto
to more readable/writeable file-like proxy objects or objects which transform the data (ex.Lib/_compression
) is out of scope for this issue.[0]
cpython/Lib/subprocess.py
Lines 1914 to 1921 in 298dda5
cpython/Lib/multiprocessing/forkserver.py
Lines 384 to 392 in 298dda5
cpython/Lib/_pyio.py
Lines 1695 to 1701 in 298dda5
os.read
loops to migrateWell contained
os.read
loopsmultiprocessing.forkserver read_signed
- @cmaloney - gh-129205: Update multiprocessing.forkserver to use os.readinto #129425[x]subprocess Popen._execute_child
- @cmaloney - gh-129205: Use os.readinto() in subprocess errpipe_read #129498os.read
loop interleaved with other code_pyio FileIO.read FileIO.readall FileIO.readinto
see, Reduce copies when reading files in pyio, match behavior of _io #129005 -- @cmaloney_pyrepl.unix_console UnixConsole.input_buffer
-- fixed length underlying buffer with "pos" / window on top.pty _copy
. Operates around a "high waterlevel" / attempt to have a fixed-ish size buffer. Wrapsos.read
with a_read
function.subprocess Popen.communicate
. Note, this feels like something non-contiguous Py_buffer would be really good for, particularly inself.text_mode
where currently all the bytes are "copied" into a contiguousbytes
to turn then turn into text...tarfile _Stream._read and _Stream.__read
. Note, builds _LowLevelFile aroundos.read
, but other read methods also available.Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
#129005 (comment)
Linked PRs
The text was updated successfully, but these errors were encountered: