Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using .tar.bz, maybe?? #3503

Open
Artoria2e5 opened this issue May 11, 2023 · 11 comments
Open

Stop using .tar.bz, maybe?? #3503

Artoria2e5 opened this issue May 11, 2023 · 11 comments
Labels
Distribution Related to binary distribution enhancement Enhancement request

Comments

@Artoria2e5
Copy link

Describe the bug
The current releases use CEF_ARCHIVE_FORMAT set to tarbz. This is extremely slow to decompress. Bzip2 unpacks slower than xz and does not even compress better.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://cef-builds.spotifycdn.com/index.html
  2. Click on a PDB download
  3. Wait
  4. Decomress
  5. WAIT, DRINK COFFEE

Expected behavior
We could really use xz to get at least double the decompression speed. Or even zstd, at the cost of worse compression. These two are extremely widespread.

Screenshots

Versions (please complete the following information):

  • OS: Windows 11, but really does not matter

Additional context
Python tarfile has xz support since 3.3. You don't even need to get an external program!

@Artoria2e5 Artoria2e5 added the bug Bug report label May 11, 2023
@magreenblatt
Copy link
Collaborator

magreenblatt commented May 11, 2023

What is the size difference between xz and bz2 when creating archives using Python?

@Artoria2e5
Copy link
Author

The compression method of the lzma library is identical to xz defaults (preset 6), according to the documentation. Knowing that, I decompressed cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar.bz2 into the tar, then recompressed it with xz.

$ ls -l cef*
-rw-r--r-- 1 arthu arthu 825856000 May 12 14:17 cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar
-rw-r--r-- 1 arthu arthu 275852077 May 12 14:17 cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar.bz2
-rw-r--r-- 1 arthu arthu 201699604 May 12 14:17 cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar.xz

Huh, much smaller. Decompression timing:

$ time bzip2 -d -c cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar.bz2 >/dev/null

real    0m30.315s
user    0m15.593s
sys     0m0.187s
$ time xz -d -c cef_binary_113.1.4+g327635f+chromium-113.0.5672.63_windows64.tar.xz > /dev/null

real    0m12.301s
user    0m5.265s
sys     0m0.203s

And much faster.

@Artoria2e5
Copy link
Author

While we are at it, the state of make_distrib.py really isn't good. The else: create_7z_archive(dir, archive_format) branch is dead code. Well, technically the entire 7z function is...

@dmitry-azaraev
Copy link
Contributor

Then 7zip is simply better, which initially was used, but then rejected for some reason. .tar.bz or .tar.xz will always go slower, as it non-true "solid" archive which requires, depending on tools, to uncompress .bz and then extract .tar. I've expect acceptable interopability (as consumer) and this is not tar-variations on windows (i have no issues personally but generally is not ideal). Plain .zip is still winner in this sense. 7zip is right after it. Also 7zip used by chromium build so it should be on board at least for windows (it used for installer).

@magreenblatt
Copy link
Collaborator

We also need to consider what comes default-installed on most OSes, and what is supported by common tools like CMake and TeamCity. Also related to issue #2446 (symlink support).

@Artoria2e5
Copy link
Author

Artoria2e5 commented May 12, 2023

xz comes default installed on most OS. With tar it's always pure solid and has good encoding story (almost always UTF-8). The 2-level decompression is a result of how archive programs are designed on Windows: they are designed around showing file contents, instead of just a full streaming extraction. But since tar has no central directory, it takes a full decompression to show contents anyways. The point is, not tar's fault. See also M2Team/NanaZip#138

7z has a stronger encoding story (mandatory UTF-16), option to be selectively solid, but two issues: pre-installation (partially solved by bsdtar support) and symlink (uh-oh).

zip is a fragmented mess. No solid support, okay pre-installation. Symlink support is possible via Info-ZIP extension but does not seem to be present in Python zipfile.

@dmitry-azaraev
Copy link
Contributor

To clarify, I'm not against .xz, it virtually same thing, so it provides also good compression ratio, which I'm welcomed.

Also, Windows 10 has tar(bsdtar) on board, but it again, virtually useless, as it have only gzip support. And because of this - 7zip is winner, as it anyway third-party tool.

@Artoria2e5
Copy link
Author

Artoria2e5 commented May 12, 2023

Also, Windows 10 has tar(bsdtar) on board, but it again, virtually useless, as it have only gzip support. And because of this - 7zip is winner, as it anyway third-party tool.

First time hearing this! Interestingly, tar xf a tar.bz2 works, so they have also put in bzip2 support. Since there's no bzip2.exe in my PATH, it's probably compiled in via a library. Which is a bit of a surprise if you think about it, since they could've as easily linked to the public-domain liblzma too.

Ah you know what, let me throw something in the Feedback Hub. No idea if they read it.

@dmitry-azaraev
Copy link
Contributor

@Artoria2e5 mine tar requires bzip2.exe and it doesnt work cause bzip2 absent. Windows 10 also includes curl. Nice, but it compiled without zlib/gzip support, so it cant download compressed deflate stream. And i'm anyway using standalone curl. Agreed what it is kind of strange. :)

@Artoria2e5
Copy link
Author

Huh, Microsoft is now making the built-in bsdtar the basis of a new feature, it seems. https://www.bleepingcomputer.com/news/microsoft/windows-11-getting-native-support-for-7-zip-rar-and-gz-archives/

I got "working on it" tagged in the Feedback Hub, so they are putting some work in it.

@magreenblatt magreenblatt added enhancement Enhancement request Distribution Related to binary distribution and removed bug Bug report labels May 31, 2023
@avgarint
Copy link

avgarint commented Aug 3, 2024

#3503 (comment)

I support this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Distribution Related to binary distribution enhancement Enhancement request
Projects
None yet
Development

No branches or pull requests

4 participants