-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: download and compute hashes in chunks of 1MB, did you know the progress bar was 30% of the runtime! #12810
Conversation
Hello, all green and all comments reviewed, should be good to merge now |
I've replaced 1024*1024 with a constant as requested. two constants actually, because the network chunks and file chunks don't necessary need to be the same size and we don't necessarily want these files to import each other just for one constant. |
(pingback) ping back to the bug tickets in requests, they've had the issue opened for nearly 10 years to stop using a small chunk size but it was never fixed ^^ |
one quick check on older version of pip on older python version pip version 21.0.1 on python 3.8 default chunk size 10 kiB
Saved ./torch-1.13.1+cu117-cp38-cp38-linux_x86_64.whl real 0m30.120s larger chunk size 1 MiB
Saved ./torch-1.13.1+cu117-cp38-cp38-linux_x86_64.whl real 0m15.173s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this does seem like a good idea on principal and the performance uplift is compelling, I think this can be a bit smarter to ensure responsive feedback to the user.
Let's pretend I'm on a 5 Mbps down connection and I'm installing Black. black-24.4.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
is 1.8 MB. At 5 Mbps, each megabyte will take 1.6 seconds, meaning there will be a 1.6s delay between each update of the progress bar. This results in an awkward delay in the progress bar that could be perceived as a hang.1
Screencast.from.2024-07-09.15-18-47.webm
I know this is nit-picky, but responsive design is important and providing immediate feedback on the download would be nice. Perhaps the download size could be scaled dynamically based on the download size (if known).
chunk_size = 1024 * 1024
if total_length:
# Reduce the default chunk size to a small fraction of the total size to
# ensure responsive progress updates (up to a lower bound to prevent
# harmfully low chunk sizes), or to 1 MiB, whatever is lower.
# TODO: it's probably best to round to the "nearest" power of 2...?
chunk_size = max(1024 * 128, min(total_length // 15, chunk_size))
chunks = response_chunks(resp, chunk_size)
Screencast.from.2024-07-09.16-07-35.webm
Although this would quickly get tricky as evident by the example logic above2, so perhaps it would be best to simply lower the download chunk size to 512 KiB or 256 KiB to balance chunk overhead and responsiveness.3 To reduce the performance penalty, you can reduce the download progress bar's refresh rate to something more reasonable than 30/s.
pip/src/pip/_internal/cli/progress_bars.py
Lines 52 to 53 in 34e70ba
progress = Progress(*columns, refresh_per_second=30) | |
task_id = progress.add_task(" " * (get_indentation() + 2), total=total) |
A value between 5-10 seems fine to me.
If people don't think it's worth optimizing for responsiveness, that's fine—go ahead and ignore my suggestion—but you can still reduce the progress bar refresh rate at least (since performance is the entire name of the game here 😉).
Footnotes
-
It also feels slower to me, although that's subjective :) ↩
-
I didn't put that much thought into it. It probably needs a lot of tweaking... ↩
-
While I mention lower chunk sizes, I'm curious, are there any other potential negative consequences of setting the chunk size to a large value like 1 MiB? Can it have a harmful effect on flaky connections? ↩
FWIW I agree with @ichard26, there have been complaints before from users who are on slow or unreliable connections that pip isn't always friendly. With this PR there becomes this awkward middle ground of file sizes (between 512 KB and 10 MB) where they will complete extremely fast and not be noticable on people with high bandwidth connections, but may appear stuck on a user who has a slow enough connection that takes a noticeable amount of time. I assume chunk size is fixed, and pip can't determine ahead of time how fast or reliable a connection is. So @ichard26 approach seems a good compromise to me (without debating the specific numbers which could always be tweaked). |
I've been discussing my review with others (in the PyPA Discord) and Ethan made a suggestion that involves dynamically updating the chunk size:
TBH it sounds even more complicated and I don't think I'd want to maintain such logic, but it could be a valid approach. |
If chunk size is dynamic (which it doesn't look like?), I think it would make sense to start at 8 kB and keep doubling until some time threshold was exceeded (e.g. 1 second) or some maximum value was reached (e.g. 1 MB) |
Hello, I've made adjustments. We can now reach ~450 MB/s download speed internally, up from ~60 MB before this PR.
Do not use smaller chunks, as this will incur negative performance waiting on I/O operations from the device and for the interpreter to run more iterations (note that python 3.11+ brought significant improvements on the interpreting speed). Most devices, especially HDD or USB devices, would benefit from larger block size (1+ MB ) but the difference is not necessarily significant. (I see some inefficient copy in the SSL code to reconstrcut large packets and they'd fall outside of CPU cache, which could incur microsecond optimizations, but this is outside of the scope of this patch.)
(It's funny because I'm testing and the UI "feels" more responsive the more often it's flashing (I can force refresh 60 times a second and it's very flashy!) but that doesn't make anything faster. Users won't run 3 windows on pip install with 1, 5 and 60 refresh a second and won't tell the difference) Last I checked 3 years ago we were talking ~40% of the UK with <8 Mbps connections including ~10% with ~2Mbps or below. Downloading a movie takes a whole hour (tensorflow and torch-cpu are in that order depending on platform). |
pip can be used not only with PyPI, but also with local repository or proxy repository (e.g. JFrog Artifactory, Sonatype Nexus). Downloading packages within internal company network can be much faster. I'd prefer having an instantaneous CI builds, if it it possible. |
I see failing tests. rebased on master and squashed all the commits. the next build should pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it does partially suck to lose some of the smoothness in the progress bar, the responsiveness is good enough at this point which is all that actually matters :)
The last comment I have is whether we should adjust the minimum download size that enables the download progress bar. It's currently at 40 KB, but the progress bar is essentially useless until 256 KiB as the entire download will complete in one chunk anyway (unless I'm misunderstanding how chunk sizes work, perhaps the read()
call isn't guaranteed to return the requested chunk size?)
(It's funny because I'm testing and the UI "feels" more responsive the more often it's flashing (I can force refresh 60 times a second and it's very flashy!) but that doesn't make anything faster. Users won't run 3 windows on pip install with 1, 5 and 60 refresh a second and won't tell the difference)
I'm glad the performance is even better than before. 5 refreshes per second does actually seem a bit slow, but that's a matter of taste. I agree that users won't notice or wouldn't care enough to complain.
Last I checked 3 years ago we were talking ~40% of the UK with <8 Mbps connections including ~10% with ~2Mbps or below.
I myself lived with a 2Mbps connections for many years.
Not too long ago, I was working with a "in-theory" 25/2 Mbps internet connection. That's pretty good I realize, but the download speed I got in practice was much lower. I'm glad you understand where I was coming from.
Downloading a movie takes a whole hour (tensorflow and torch-cpu are in that order depending on platform).
Downloading a game on steam can take 3 days.
There is no improvement to the progress bar you can make to make the experience less terrible. Users just have to wait :D
Right, but as @notatallshaw pointed out, this patch in its original form most impacted the download UX of medium-sized distributions. That's what I was trying to optimize for. Of course, if you're on a slow enough connection, any large distributions is going to progress at a snail's pace and even the most responsive progress bar wouldn't help :)
What about checking if terminal is a TTY, and if not (e.g. CI run, Docker build), use the maximum possible chunk size.
Frankly, if I understand @morotti's comment, increasing the chunk size beyond 256 KiB nets only marginal gains. With the current version of this PR, they recorded a top download speed of 460 MB/s which is over 3.5 Gbps. Do more enterprises have 2.5+ Gbps ethernet links to their internal network than I think...?
Thanks @morotti for bearing with the considerable back and forth on this PR. I hope you found my comments useful and not too nit-picky :)
(repushing to trigger builds, there seem to be a flaky test on Windows) Thanks, I've adjusted the progress bar to only render for packages > 512 kiB, up from 40 kiB. I think that's a reasonable cutoff. A simple "pip install jupyter" is installing 60 packages, most are very small. I'm finding the pip output easier to read and follow, with less progress bars appearing and disappearing very fast. read() returns the provided chunk size, except for the last chunk.
A lot of companies have employees working on a remote machine (VM, VDI, RDP, remote terminal, ssh, etc...). Hardware has been dual 10 Gbps for more than a decade, 40 Gbps is common nowadays. |
There seem to be two tests in master that are flaky on Windows. Added in May. FAILED tests/unit/test_utils_retry.py::test_retry_wait[0.015] - assert (669.765 - 669.75) >= 0.015 EDIT: i see they are discussed in another PR #12839 |
(rebasing to pick up test fixes on main) |
All green now that main branch has been fixed. final result:
|
I'm not sure if this is the right call, but I don't feel strongly enough so I won't block the PR on this. We can see if anyone complains later. I don't have the commit bit so I can't merge anything. I'm a triager, not a maintainer as I'm too new to the project :) |
This seems like a good improvement, and it's easy enough to revert if it causes issues, so I'm going to merge it pre-emptively. @pradyunsg if you have concerns for 24.2, feel free to revert it. |
…k/test/generated-code (#4584) Bumps [pip](https://github.com/pypa/pip) from 24.1.2 to 24.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/pip/blob/main/NEWS.rst">pip's changelog</a>.</em></p> <blockquote> <h1>24.2 (2024-07-28)</h1> <h2>Deprecations and Removals</h2> <ul> <li>Deprecate <code>pip install --editable</code> falling back to <code>setup.py develop</code> when using a setuptools version that does not support :pep:<code>660</code> (setuptools v63 and older). (<code>[#11457](pypa/pip#11457) <https://github.com/pypa/pip/issues/11457></code>_)</li> </ul> <h2>Features</h2> <ul> <li> <p>Check unsupported packages for the current platform. (<code>[#11054](pypa/pip#11054) <https://github.com/pypa/pip/issues/11054></code>_)</p> </li> <li> <p>Use system certificates <em>and</em> certifi certificates to verify HTTPS connections on Python 3.10+. Python 3.9 and earlier only use certifi.</p> <p>To revert to previous behaviour, pass the flag <code>--use-deprecated=legacy-certs</code>. (<code>[#11647](pypa/pip#11647) <https://github.com/pypa/pip/issues/11647></code>_)</p> </li> <li> <p>Improve discovery performance of installed packages when the <code>importlib.metadata</code> backend is used to load distribution metadata (used by default under Python 3.11+). (<code>[#12656](pypa/pip#12656) <https://github.com/pypa/pip/issues/12656></code>_)</p> </li> <li> <p>Improve performance when the same requirement string appears many times during resolution, by consistently caching the parsed requirement string. (<code>[#12663](pypa/pip#12663) <https://github.com/pypa/pip/issues/12663></code>_)</p> </li> <li> <p>Minor performance improvement of finding applicable package candidates by not repeatedly calculating their versions (<code>[#12664](pypa/pip#12664) <https://github.com/pypa/pip/issues/12664></code>_)</p> </li> <li> <p>Disable pip's self version check when invoking a pip subprocess to install PEP 517 build requirements. (<code>[#12683](pypa/pip#12683) <https://github.com/pypa/pip/issues/12683></code>_)</p> </li> <li> <p>Improve dependency resolution performance by caching platform compatibility tags during wheel cache lookup. (<code>[#12712](pypa/pip#12712) <https://github.com/pypa/pip/issues/12712></code>_)</p> </li> <li> <p><code>wheel</code> is no longer explicitly listed as a build dependency of <code>pip</code>. <code>setuptools</code> injects this dependency in the <code>get_requires_for_build_wheel()</code> hook and no longer needs it on newer versions. (<code>[#12728](pypa/pip#12728) <https://github.com/pypa/pip/issues/12728></code>_)</p> </li> <li> <p>Ignore <code>--require-virtualenv</code> for <code>pip check</code> and <code>pip freeze</code> (<code>[#12842](pypa/pip#12842) <https://github.com/pypa/pip/issues/12842></code>_)</p> </li> <li> <p>Improve package download and install performance.</p> <p>Increase chunk sizes when downloading (256 kB, up from 10 kB) and reading files (1 MB, up from 8 kB). This reduces the frequency of updates to pip's progress bar. (<code>[#12810](pypa/pip#12810) <https://github.com/pypa/pip/issues/12810></code>_)</p> </li> <li> <p>Improve pip install performance.</p> <p>Files are now extracted in 1MB blocks, or in one block matching the file size for smaller files. A decompressor is no longer instantiated when extracting 0 bytes files, it is not necessary because there is no data to decompress. (<code>[#12803](pypa/pip#12803) <https://github.com/pypa/pip/issues/12803></code>_)</p> </li> </ul> <h2>Bug Fixes</h2> <ul> <li>Set <code>no_color</code> to global <code>rich.Console</code> instance. (<code>[#11045](pypa/pip#11045) <https://github.com/pypa/pip/issues/11045></code>_)</li> <li>Fix resolution to respect <code>--python-version</code> when checking <code>Requires-Python</code>. (<code>[#12216](pypa/pip#12216) <https://github.com/pypa/pip/issues/12216></code>_)</li> <li>Perform hash comparisons in a case-insensitive manner. (<code>[#12680](pypa/pip#12680) <https://github.com/pypa/pip/issues/12680></code>_)</li> <li>Avoid <code>dlopen</code> failure for glibc detection in musl builds (<code>[#12716](pypa/pip#12716) <https://github.com/pypa/pip/issues/12716></code>_)</li> <li>Avoid keyring logging crashes when pip is run in verbose mode. (<code>[#12751](pypa/pip#12751) <https://github.com/pypa/pip/issues/12751></code>_)</li> </ul> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pypa/pip/commit/97146c7f4cd85551f3dc261830a57f304e43c181"><code>97146c7</code></a> Bump for release</li> <li><a href="https://github.com/pypa/pip/commit/ef81b2eafd390fb56f62930dcd74f6e4580093e0"><code>ef81b2e</code></a> Update AUTHORS.txt</li> <li><a href="https://github.com/pypa/pip/commit/350a0570a88b6c0d13c68f81ac08dc64f954cadf"><code>350a057</code></a> Bump the github-actions group with 2 updates (<a href="https://redirect.github.com/pypa/pip/issues/12876">#12876</a>)</li> <li><a href="https://github.com/pypa/pip/commit/184390f4f2cde0316801eb701f49dda4f7a9a6ac"><code>184390f</code></a> Update dependabot.yml to bump group updates (<a href="https://redirect.github.com/pypa/pip/issues/12572">#12572</a>)</li> <li><a href="https://github.com/pypa/pip/commit/48917f1c0375496058d677f652a90de6bee4dc8c"><code>48917f1</code></a> Merge pull request <a href="https://redirect.github.com/pypa/pip/issues/12875">#12875</a> from hellozee/fix-unit-test</li> <li><a href="https://github.com/pypa/pip/commit/dd85c28464dbfc9b3a53c885a41c209e4700ad2d"><code>dd85c28</code></a> Fix invalid origin test to check all the logged messages</li> <li><a href="https://github.com/pypa/pip/commit/203780b5d167c4d01c55df7adc91d5ad1a0563aa"><code>203780b</code></a> Merge pull request <a href="https://redirect.github.com/pypa/pip/issues/12865">#12865</a> from pradyunsg/better-exception-handling-around-sel...</li> <li><a href="https://github.com/pypa/pip/commit/e50314134886d5eb5b650b3ce95abaafcb6dce10"><code>e503141</code></a> Properly mock <code>_self_version_check_logic</code></li> <li><a href="https://github.com/pypa/pip/commit/3518d3293445ad43eedba116b6182185c03abda3"><code>3518d32</code></a> Rework how <code>--debug</code> is handled in <code>main</code></li> <li><a href="https://github.com/pypa/pip/commit/be21d82e4362c00aab451ef1cf212d9a62f8e58e"><code>be21d82</code></a> Move exception suppression to cover more of self-version-check logic</li> <li>Additional commits viewable in <a href="https://github.com/pypa/pip/compare/24.1.2...24.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pip&package-manager=pip&previous-version=24.1.2&new-version=24.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details>
…s/@jsii/python-runtime (#4588) Updates the requirements on [pip](https://github.com/pypa/pip) to permit the latest version. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/pip/blob/main/NEWS.rst">pip's changelog</a>.</em></p> <blockquote> <h1>24.2 (2024-07-28)</h1> <h2>Deprecations and Removals</h2> <ul> <li>Deprecate <code>pip install --editable</code> falling back to <code>setup.py develop</code> when using a setuptools version that does not support :pep:<code>660</code> (setuptools v63 and older). (<code>[#11457](pypa/pip#11457) <https://github.com/pypa/pip/issues/11457></code>_)</li> </ul> <h2>Features</h2> <ul> <li> <p>Check unsupported packages for the current platform. (<code>[#11054](pypa/pip#11054) <https://github.com/pypa/pip/issues/11054></code>_)</p> </li> <li> <p>Use system certificates <em>and</em> certifi certificates to verify HTTPS connections on Python 3.10+. Python 3.9 and earlier only use certifi.</p> <p>To revert to previous behaviour, pass the flag <code>--use-deprecated=legacy-certs</code>. (<code>[#11647](pypa/pip#11647) <https://github.com/pypa/pip/issues/11647></code>_)</p> </li> <li> <p>Improve discovery performance of installed packages when the <code>importlib.metadata</code> backend is used to load distribution metadata (used by default under Python 3.11+). (<code>[#12656](pypa/pip#12656) <https://github.com/pypa/pip/issues/12656></code>_)</p> </li> <li> <p>Improve performance when the same requirement string appears many times during resolution, by consistently caching the parsed requirement string. (<code>[#12663](pypa/pip#12663) <https://github.com/pypa/pip/issues/12663></code>_)</p> </li> <li> <p>Minor performance improvement of finding applicable package candidates by not repeatedly calculating their versions (<code>[#12664](pypa/pip#12664) <https://github.com/pypa/pip/issues/12664></code>_)</p> </li> <li> <p>Disable pip's self version check when invoking a pip subprocess to install PEP 517 build requirements. (<code>[#12683](pypa/pip#12683) <https://github.com/pypa/pip/issues/12683></code>_)</p> </li> <li> <p>Improve dependency resolution performance by caching platform compatibility tags during wheel cache lookup. (<code>[#12712](pypa/pip#12712) <https://github.com/pypa/pip/issues/12712></code>_)</p> </li> <li> <p><code>wheel</code> is no longer explicitly listed as a build dependency of <code>pip</code>. <code>setuptools</code> injects this dependency in the <code>get_requires_for_build_wheel()</code> hook and no longer needs it on newer versions. (<code>[#12728](pypa/pip#12728) <https://github.com/pypa/pip/issues/12728></code>_)</p> </li> <li> <p>Ignore <code>--require-virtualenv</code> for <code>pip check</code> and <code>pip freeze</code> (<code>[#12842](pypa/pip#12842) <https://github.com/pypa/pip/issues/12842></code>_)</p> </li> <li> <p>Improve package download and install performance.</p> <p>Increase chunk sizes when downloading (256 kB, up from 10 kB) and reading files (1 MB, up from 8 kB). This reduces the frequency of updates to pip's progress bar. (<code>[#12810](pypa/pip#12810) <https://github.com/pypa/pip/issues/12810></code>_)</p> </li> <li> <p>Improve pip install performance.</p> <p>Files are now extracted in 1MB blocks, or in one block matching the file size for smaller files. A decompressor is no longer instantiated when extracting 0 bytes files, it is not necessary because there is no data to decompress. (<code>[#12803](pypa/pip#12803) <https://github.com/pypa/pip/issues/12803></code>_)</p> </li> </ul> <h2>Bug Fixes</h2> <ul> <li>Set <code>no_color</code> to global <code>rich.Console</code> instance. (<code>[#11045](pypa/pip#11045) <https://github.com/pypa/pip/issues/11045></code>_)</li> <li>Fix resolution to respect <code>--python-version</code> when checking <code>Requires-Python</code>. (<code>[#12216](pypa/pip#12216) <https://github.com/pypa/pip/issues/12216></code>_)</li> <li>Perform hash comparisons in a case-insensitive manner. (<code>[#12680](pypa/pip#12680) <https://github.com/pypa/pip/issues/12680></code>_)</li> <li>Avoid <code>dlopen</code> failure for glibc detection in musl builds (<code>[#12716](pypa/pip#12716) <https://github.com/pypa/pip/issues/12716></code>_)</li> <li>Avoid keyring logging crashes when pip is run in verbose mode. (<code>[#12751](pypa/pip#12751) <https://github.com/pypa/pip/issues/12751></code>_)</li> </ul> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pypa/pip/commit/97146c7f4cd85551f3dc261830a57f304e43c181"><code>97146c7</code></a> Bump for release</li> <li><a href="https://github.com/pypa/pip/commit/ef81b2eafd390fb56f62930dcd74f6e4580093e0"><code>ef81b2e</code></a> Update AUTHORS.txt</li> <li><a href="https://github.com/pypa/pip/commit/350a0570a88b6c0d13c68f81ac08dc64f954cadf"><code>350a057</code></a> Bump the github-actions group with 2 updates (<a href="https://redirect.github.com/pypa/pip/issues/12876">#12876</a>)</li> <li><a href="https://github.com/pypa/pip/commit/184390f4f2cde0316801eb701f49dda4f7a9a6ac"><code>184390f</code></a> Update dependabot.yml to bump group updates (<a href="https://redirect.github.com/pypa/pip/issues/12572">#12572</a>)</li> <li><a href="https://github.com/pypa/pip/commit/48917f1c0375496058d677f652a90de6bee4dc8c"><code>48917f1</code></a> Merge pull request <a href="https://redirect.github.com/pypa/pip/issues/12875">#12875</a> from hellozee/fix-unit-test</li> <li><a href="https://github.com/pypa/pip/commit/dd85c28464dbfc9b3a53c885a41c209e4700ad2d"><code>dd85c28</code></a> Fix invalid origin test to check all the logged messages</li> <li><a href="https://github.com/pypa/pip/commit/203780b5d167c4d01c55df7adc91d5ad1a0563aa"><code>203780b</code></a> Merge pull request <a href="https://redirect.github.com/pypa/pip/issues/12865">#12865</a> from pradyunsg/better-exception-handling-around-sel...</li> <li><a href="https://github.com/pypa/pip/commit/e50314134886d5eb5b650b3ce95abaafcb6dce10"><code>e503141</code></a> Properly mock <code>_self_version_check_logic</code></li> <li><a href="https://github.com/pypa/pip/commit/3518d3293445ad43eedba116b6182185c03abda3"><code>3518d32</code></a> Rework how <code>--debug</code> is handled in <code>main</code></li> <li><a href="https://github.com/pypa/pip/commit/be21d82e4362c00aab451ef1cf212d9a62f8e58e"><code>be21d82</code></a> Move exception suppression to cover more of self-version-check logic</li> <li>Additional commits viewable in <a href="https://github.com/pypa/pip/compare/24.1...24.2">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details>
Bumps [pip](https://github.com/pypa/pip) from 24.1.2 to 24.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/pip/blob/main/NEWS.rst">pip's changelog</a>.</em></p> <blockquote> <h1>24.2 (2024-07-28)</h1> <h2>Deprecations and Removals</h2> <ul> <li>Deprecate <code>pip install --editable</code> falling back to <code>setup.py develop</code> when using a setuptools version that does not support :pep:<code>660</code> (setuptools v63 and older). (<code>[#11457](pypa/pip#11457) <https://github.com/pypa/pip/issues/11457></code>_)</li> </ul> <h2>Features</h2> <ul> <li> <p>Check unsupported packages for the current platform. (<code>[#11054](pypa/pip#11054) <https://github.com/pypa/pip/issues/11054></code>_)</p> </li> <li> <p>Use system certificates <em>and</em> certifi certificates to verify HTTPS connections on Python 3.10+. Python 3.9 and earlier only use certifi.</p> <p>To revert to previous behaviour, pass the flag <code>--use-deprecated=legacy-certs</code>. (<code>[#11647](pypa/pip#11647) <https://github.com/pypa/pip/issues/11647></code>_)</p> </li> <li> <p>Improve discovery performance of installed packages when the <code>importlib.metadata</code> backend is used to load distribution metadata (used by default under Python 3.11+). (<code>[#12656](pypa/pip#12656) <https://github.com/pypa/pip/issues/12656></code>_)</p> </li> <li> <p>Improve performance when the same requirement string appears many times during resolution, by consistently caching the parsed requirement string. (<code>[#12663](pypa/pip#12663) <https://github.com/pypa/pip/issues/12663></code>_)</p> </li> <li> <p>Minor performance improvement of finding applicable package candidates by not repeatedly calculating their versions (<code>[#12664](pypa/pip#12664) <https://github.com/pypa/pip/issues/12664></code>_)</p> </li> <li> <p>Disable pip's self version check when invoking a pip subprocess to install PEP 517 build requirements. (<code>[#12683](pypa/pip#12683) <https://github.com/pypa/pip/issues/12683></code>_)</p> </li> <li> <p>Improve dependency resolution performance by caching platform compatibility tags during wheel cache lookup. (<code>[#12712](pypa/pip#12712) <https://github.com/pypa/pip/issues/12712></code>_)</p> </li> <li> <p><code>wheel</code> is no longer explicitly listed as a build dependency of <code>pip</code>. <code>setuptools</code> injects this dependency in the <code>get_requires_for_build_wheel()</code> hook and no longer needs it on newer versions. (<code>[#12728](pypa/pip#12728) <https://github.com/pypa/pip/issues/12728></code>_)</p> </li> <li> <p>Ignore <code>--require-virtualenv</code> for <code>pip check</code> and <code>pip freeze</code> (<code>[#12842](pypa/pip#12842) <https://github.com/pypa/pip/issues/12842></code>_)</p> </li> <li> <p>Improve package download and install performance.</p> <p>Increase chunk sizes when downloading (256 kB, up from 10 kB) and reading files (1 MB, up from 8 kB). This reduces the frequency of updates to pip's progress bar. (<code>[#12810](pypa/pip#12810) <https://github.com/pypa/pip/issues/12810></code>_)</p> </li> <li> <p>Improve pip install performance.</p> <p>Files are now extracted in 1MB blocks, or in one block matching the file size for smaller files. A decompressor is no longer instantiated when extracting 0 bytes files, it is not necessary because there is no data to decompress. (<code>[#12803](pypa/pip#12803) <https://github.com/pypa/pip/issues/12803></code>_)</p> </li> </ul> <h2>Bug Fixes</h2> <ul> <li>Set <code>no_color</code> to global <code>rich.Console</code> instance. (<code>[#11045](pypa/pip#11045) <https://github.com/pypa/pip/issues/11045></code>_)</li> <li>Fix resolution to respect <code>--python-version</code> when checking <code>Requires-Python</code>. (<code>[#12216](pypa/pip#12216) <https://github.com/pypa/pip/issues/12216></code>_)</li> <li>Perform hash comparisons in a case-insensitive manner. (<code>[#12680](pypa/pip#12680) <https://github.com/pypa/pip/issues/12680></code>_)</li> <li>Avoid <code>dlopen</code> failure for glibc detection in musl builds (<code>[#12716](pypa/pip#12716) <https://github.com/pypa/pip/issues/12716></code>_)</li> <li>Avoid keyring logging crashes when pip is run in verbose mode. (<code>[#12751](pypa/pip#12751) <https://github.com/pypa/pip/issues/12751></code>_)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pypa/pip/commit/97146c7f4cd85551f3dc261830a57f304e43c181"><code>97146c7</code></a> Bump for release</li> <li><a href="https://github.com/pypa/pip/commit/ef81b2eafd390fb56f62930dcd74f6e4580093e0"><code>ef81b2e</code></a> Update AUTHORS.txt</li> <li><a href="https://github.com/pypa/pip/commit/350a0570a88b6c0d13c68f81ac08dc64f954cadf"><code>350a057</code></a> Bump the github-actions group with 2 updates (<a href="https://redirect.github.com/pypa/pip/issues/12876">#12876</a>)</li> <li><a href="https://github.com/pypa/pip/commit/184390f4f2cde0316801eb701f49dda4f7a9a6ac"><code>184390f</code></a> Update dependabot.yml to bump group updates (<a href="https://redirect.github.com/pypa/pip/issues/12572">#12572</a>)</li> <li><a href="https://github.com/pypa/pip/commit/48917f1c0375496058d677f652a90de6bee4dc8c"><code>48917f1</code></a> Merge pull request <a href="https://redirect.github.com/pypa/pip/issues/12875">#12875</a> from hellozee/fix-unit-test</li> <li><a href="https://github.com/pypa/pip/commit/dd85c28464dbfc9b3a53c885a41c209e4700ad2d"><code>dd85c28</code></a> Fix invalid origin test to check all the logged messages</li> <li><a href="https://github.com/pypa/pip/commit/203780b5d167c4d01c55df7adc91d5ad1a0563aa"><code>203780b</code></a> Merge pull request <a href="https://redirect.github.com/pypa/pip/issues/12865">#12865</a> from pradyunsg/better-exception-handling-around-sel...</li> <li><a href="https://github.com/pypa/pip/commit/e50314134886d5eb5b650b3ce95abaafcb6dce10"><code>e503141</code></a> Properly mock <code>_self_version_check_logic</code></li> <li><a href="https://github.com/pypa/pip/commit/3518d3293445ad43eedba116b6182185c03abda3"><code>3518d32</code></a> Rework how <code>--debug</code> is handled in <code>main</code></li> <li><a href="https://github.com/pypa/pip/commit/be21d82e4362c00aab451ef1cf212d9a62f8e58e"><code>be21d82</code></a> Move exception suppression to cover more of self-version-check logic</li> <li>Additional commits viewable in <a href="https://github.com/pypa/pip/compare/24.1.2...24.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pip&package-manager=pip&previous-version=24.1.2&new-version=24.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details>
Bumps [pip](https://github.com/pypa/pip) from 24.1.2 to 24.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/pip/blob/main/NEWS.rst">pip's changelog</a>.</em></p> <blockquote> <h1>24.2 (2024-07-28)</h1> <h2>Deprecations and Removals</h2> <ul> <li>Deprecate <code>pip install --editable</code> falling back to <code>setup.py develop</code> when using a setuptools version that does not support :pep:<code>660</code> (setuptools v63 and older). (<code>[#11457](pypa/pip#11457) <https://github.com/pypa/pip/issues/11457></code>_)</li> </ul> <h2>Features</h2> <ul> <li> <p>Check unsupported packages for the current platform. (<code>[#11054](pypa/pip#11054) <https://github.com/pypa/pip/issues/11054></code>_)</p> </li> <li> <p>Use system certificates <em>and</em> certifi certificates to verify HTTPS connections on Python 3.10+. Python 3.9 and earlier only use certifi.</p> <p>To revert to previous behaviour, pass the flag <code>--use-deprecated=legacy-certs</code>. (<code>[#11647](pypa/pip#11647) <https://github.com/pypa/pip/issues/11647></code>_)</p> </li> <li> <p>Improve discovery performance of installed packages when the <code>importlib.metadata</code> backend is used to load distribution metadata (used by default under Python 3.11+). (<code>[#12656](pypa/pip#12656) <https://github.com/pypa/pip/issues/12656></code>_)</p> </li> <li> <p>Improve performance when the same requirement string appears many times during resolution, by consistently caching the parsed requirement string. (<code>[#12663](pypa/pip#12663) <https://github.com/pypa/pip/issues/12663></code>_)</p> </li> <li> <p>Minor performance improvement of finding applicable package candidates by not repeatedly calculating their versions (<code>[#12664](pypa/pip#12664) <https://github.com/pypa/pip/issues/12664></code>_)</p> </li> <li> <p>Disable pip's self version check when invoking a pip subprocess to install PEP 517 build requirements. (<code>[#12683](pypa/pip#12683) <https://github.com/pypa/pip/issues/12683></code>_)</p> </li> <li> <p>Improve dependency resolution performance by caching platform compatibility tags during wheel cache lookup. (<code>[#12712](pypa/pip#12712) <https://github.com/pypa/pip/issues/12712></code>_)</p> </li> <li> <p><code>wheel</code> is no longer explicitly listed as a build dependency of <code>pip</code>. <code>setuptools</code> injects this dependency in the <code>get_requires_for_build_wheel()</code> hook and no longer needs it on newer versions. (<code>[#12728](pypa/pip#12728) <https://github.com/pypa/pip/issues/12728></code>_)</p> </li> <li> <p>Ignore <code>--require-virtualenv</code> for <code>pip check</code> and <code>pip freeze</code> (<code>[#12842](pypa/pip#12842) <https://github.com/pypa/pip/issues/12842></code>_)</p> </li> <li> <p>Improve package download and install performance.</p> <p>Increase chunk sizes when downloading (256 kB, up from 10 kB) and reading files (1 MB, up from 8 kB). This reduces the frequency of updates to pip's progress bar. (<code>[#12810](pypa/pip#12810) <https://github.com/pypa/pip/issues/12810></code>_)</p> </li> <li> <p>Improve pip install performance.</p> <p>Files are now extracted in 1MB blocks, or in one block matching the file size for smaller files. A decompressor is no longer instantiated when extracting 0 bytes files, it is not necessary because there is no data to decompress. (<code>[#12803](pypa/pip#12803) <https://github.com/pypa/pip/issues/12803></code>_)</p> </li> </ul> <h2>Bug Fixes</h2> <ul> <li>Set <code>no_color</code> to global <code>rich.Console</code> instance. (<code>[#11045](pypa/pip#11045) <https://github.com/pypa/pip/issues/11045></code>_)</li> <li>Fix resolution to respect <code>--python-version</code> when checking <code>Requires-Python</code>. (<code>[#12216](pypa/pip#12216) <https://github.com/pypa/pip/issues/12216></code>_)</li> <li>Perform hash comparisons in a case-insensitive manner. (<code>[#12680](pypa/pip#12680) <https://github.com/pypa/pip/issues/12680></code>_)</li> <li>Avoid <code>dlopen</code> failure for glibc detection in musl builds (<code>[#12716](pypa/pip#12716) <https://github.com/pypa/pip/issues/12716></code>_)</li> <li>Avoid keyring logging crashes when pip is run in verbose mode. (<code>[#12751](pypa/pip#12751) <https://github.com/pypa/pip/issues/12751></code>_)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pypa/pip/commit/97146c7f4cd85551f3dc261830a57f304e43c181"><code>97146c7</code></a> Bump for release</li> <li><a href="https://github.com/pypa/pip/commit/ef81b2eafd390fb56f62930dcd74f6e4580093e0"><code>ef81b2e</code></a> Update AUTHORS.txt</li> <li><a href="https://github.com/pypa/pip/commit/350a0570a88b6c0d13c68f81ac08dc64f954cadf"><code>350a057</code></a> Bump the github-actions group with 2 updates (<a href="https://redirect.github.com/pypa/pip/issues/12876">#12876</a>)</li> <li><a href="https://github.com/pypa/pip/commit/184390f4f2cde0316801eb701f49dda4f7a9a6ac"><code>184390f</code></a> Update dependabot.yml to bump group updates (<a href="https://redirect.github.com/pypa/pip/issues/12572">#12572</a>)</li> <li><a href="https://github.com/pypa/pip/commit/48917f1c0375496058d677f652a90de6bee4dc8c"><code>48917f1</code></a> Merge pull request <a href="https://redirect.github.com/pypa/pip/issues/12875">#12875</a> from hellozee/fix-unit-test</li> <li><a href="https://github.com/pypa/pip/commit/dd85c28464dbfc9b3a53c885a41c209e4700ad2d"><code>dd85c28</code></a> Fix invalid origin test to check all the logged messages</li> <li><a href="https://github.com/pypa/pip/commit/203780b5d167c4d01c55df7adc91d5ad1a0563aa"><code>203780b</code></a> Merge pull request <a href="https://redirect.github.com/pypa/pip/issues/12865">#12865</a> from pradyunsg/better-exception-handling-around-sel...</li> <li><a href="https://github.com/pypa/pip/commit/e50314134886d5eb5b650b3ce95abaafcb6dce10"><code>e503141</code></a> Properly mock <code>_self_version_check_logic</code></li> <li><a href="https://github.com/pypa/pip/commit/3518d3293445ad43eedba116b6182185c03abda3"><code>3518d32</code></a> Rework how <code>--debug</code> is handled in <code>main</code></li> <li><a href="https://github.com/pypa/pip/commit/be21d82e4362c00aab451ef1cf212d9a62f8e58e"><code>be21d82</code></a> Move exception suppression to cover more of self-version-check logic</li> <li>Additional commits viewable in <a href="https://github.com/pypa/pip/compare/24.1.2...24.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pip&package-manager=pip&previous-version=24.1.2&new-version=24.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details>
Hello, it's me again
I'm offering you 2 more improvements in this PR.
First fix, to download the file in chunks of 1 MB.⚠️
Things to know, urllib is broken, depending on which function you use, it can download in chunks of 1 byte or 10240 bytes by default. They have some tickets about that somewhere for years, but nobody has been fixing them.
pip code was setting the chunks to CONTENT_CHUNK_SIZE=10240 from urllib which is the bad constant. You don't want to do that
Special case of pip. pip is updating a progress bar after each chunk is downloaded.
This is doing an insane amount of progress bar updates, that can take as much as 30% of the runtime for a large package :D
The PR is downloading in chunks of 1 MB. That's a reasonable size for I/O operations.
Profiling on
pip install --prefix /var/username/deleteme tensorflow-cpu --no-deps --no-cache-dir --dry-run
that takes 3 seconds to run the main().
MASTER BRANCH
FIX BRANCH
Notice: the tensorflow wheel is 207 MB.
It was updating the progress bar 20237 times, or every 10240 bytes.
Notice the same amount of calls to read() fp_read() stream read() etc...
Second fix, after the wheel is downloaded, pip is reading back the file in blocks of io.DEFAULT_BUFFER_SIZE to compute hashes of the file.
io.DEFAULT_BUFFER_SIZE is an obsolete constant that was set to 8k forever ago. You don't want to use that.
By the way, I have some tickets and PRs open on the python interpreter to fix that constant but I don't know if we will ever get to merging them python/cpython#117151
This one doesn't have too much impact thankfully, the downloaded file should be in the read cache because it was just written, and it is written to /tmp as a ramdisk for me. So it makes little difference on my machine but that really depends what type of OS and disks you have.
Cheers.