Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Astral-maintained tokio-tar fork #11174

Merged
merged 1 commit into from
Feb 3, 2025
Merged

Use Astral-maintained tokio-tar fork #11174

merged 1 commit into from
Feb 3, 2025

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Feb 3, 2025

Summary

I shipped one security fix here along with several significant performance improvements for large TAR files:

I also PR'd the security fix to edera-dev (edera-dev/tokio-tar#4).

@charliermarsh charliermarsh added performance Potential performance improvement security labels Feb 3, 2025
@charliermarsh charliermarsh marked this pull request as draft February 3, 2025 01:34
@charliermarsh
Copy link
Member Author

Converting to draft while I checkout failing debug assertions.

@charliermarsh charliermarsh force-pushed the charlie/mark branch 2 times, most recently from c5eaf84 to 49fe355 Compare February 3, 2025 01:45

// Delay any directory entries until the end, to ensure that directory permissions do not
// interfere with descendant extraction.
let mut directories = Vec::new();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to astral-sh/tokio-tar#2. I forgot that we don't actually use the crate's unpack method, we unpack ourselves (since we need to set permissions in a different way).

@@ -172,7 +188,7 @@ async fn untar_in(
let mode = file.header().mode()?;
let has_any_executable_bit = mode & 0o111;
if has_any_executable_bit != 0 {
if let Some(path) = crate::tar::unpacked_at(dst, &file.path()?) {
Copy link
Member Author

@charliermarsh charliermarsh Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of this whole vendored unpacked_at implementation. It turns out the crate now returns the target path. (That's not a change I made, it must've been made in one of the forks.)

@charliermarsh charliermarsh marked this pull request as ready for review February 3, 2025 01:56
Comment on lines +153 to +154
// Delay any directory entries until the end, to ensure that directory permissions do not
// interfere with descendant extraction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a concern for us? We usually create directories from archives without preserving permissions (at least in the zip path we should).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe this is me being unfamiliar with the codebase but this is so... bizarre. How does this unpacking a file before the dir that contains it make sense? Is the idea that we create_dir_all any file path anyway, so the dir entries only create empty dirs and set perms on dirs that have files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For zips we always create the parent of a file, as a zip can contain directory entries, but sometimes there are no directory entries, just files.

// Either create the directory or write the file to disk.
if is_dir {
if directories.insert(path.clone()) {
fs_err::tokio::create_dir_all(path).await?;
}
} else {
if let Some(parent) = path.parent() {
if directories.insert(parent.to_path_buf()) {
fs_err::tokio::create_dir_all(parent).await?;
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that we create_dir_all any file path anyway, so the dir entries only create empty dirs and set perms on dirs that have files?

Yes. When we create a file, we create the directories required for it. This happens within the async-tar crate. The reason we have to do it "manually" here is because we have our own unpack that replicates the async-tar logic but applies different permissions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I will look at why the tar crate introduced this before merging.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kind of seems like a bug that we aren't preserving directory permissions in the zip case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, i'd prefer only preserving the executable bit and otherwise using the default umask; i'd find it odd if the permissions on the build host should determine the permissions in the deployment target.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Probably requires more invasive changes to the tar crate though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to review what pip does here before making other changes (since this is just preserving our status quo).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, pip explicitly only preserves the executable bit. I'll make the same change for tar in a separate PR.

https://github.com/pypa/pip/blob/3898741e29b7279e7bffe044ecfbe20f6a438b1e/src/pip/_internal/utils/unpacking.py#L88-L100

@charliermarsh charliermarsh enabled auto-merge (squash) February 3, 2025 17:47
@charliermarsh charliermarsh merged commit 7b43baf into main Feb 3, 2025
90 of 91 checks passed
@charliermarsh charliermarsh deleted the charlie/mark branch February 3, 2025 17:51
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Feb 4, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.5.25` -> `0.5.27` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>astral-sh/uv (astral-sh/uv)</summary>

### [`v0.5.27`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0527)

[Compare Source](astral-sh/uv@0.5.26...0.5.27)

##### Enhancements

-   Avoid setting permissions during tar extraction ([#&#8203;11191](astral-sh/uv#11191))
-   Remove warnings for missing lower bounds ([#&#8203;11195](astral-sh/uv#11195))
-   Update PubGrub to set-based outdated priority tracking ([#&#8203;11169](astral-sh/uv#11169))
-   Improve error messages for `uv pip install` with `--extra` or `--all-extras` and invalid sources ([#&#8203;11193](astral-sh/uv#11193))
-   Sign Docker images using GitHub attestations ([#&#8203;8685](astral-sh/uv#8685))

##### Preview features

-   Don't expand self-referential extras in the build backend ([#&#8203;11142](astral-sh/uv#11142))

##### Performance

-   Filter discovered Python executables by source before querying ([#&#8203;11143](astral-sh/uv#11143))
-   Optimize exclusion computation for markers ([#&#8203;11158](astral-sh/uv#11158))
-   Use Astral-maintained `tokio-tar` fork ([#&#8203;11174](astral-sh/uv#11174))
-   Remove unneeded `.clone()` ([#&#8203;11127](astral-sh/uv#11127))

##### Bug fixes

-   Fix relative paths in bytecode compilation ([#&#8203;11177](astral-sh/uv#11177))
-   Percent-decode URLs in canonical comparisons ([#&#8203;11088](astral-sh/uv#11088))
-   Respect concurrency limits in parallel index fetch ([#&#8203;11182](astral-sh/uv#11182))
-   Use wire JSON schema for conflict items ([#&#8203;11196](astral-sh/uv#11196))
-   Use explicit `_GLibCVersion` tuple in uv-python crate ([#&#8203;11122](astral-sh/uv#11122))

##### Documentation

-   Add Git SHA locking behavior to docs ([#&#8203;11125](astral-sh/uv#11125))
-   Add best-practice flags to `pip install` example in troubleshooting guide ([#&#8203;11194](astral-sh/uv#11194))
-   Set `VIRTUAL_ENV` in Jupyter kernels ([#&#8203;11155](astral-sh/uv#11155))
-   Add instructions for deactivating an environment ([#&#8203;11200](astral-sh/uv#11200))

### [`v0.5.26`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0526)

[Compare Source](astral-sh/uv@0.5.25...0.5.26)

##### Enhancements

-   Add support for `uvx python` ([#&#8203;11076](astral-sh/uv#11076))
-   Allow `--no-dev --invert` in `uv tree` ([#&#8203;11068](astral-sh/uv#11068))
-   Update `uv python install --reinstall` to reinstall all previous versions ([#&#8203;11072](astral-sh/uv#11072))
-   Consistently write log messages with capitalized first word ([#&#8203;11111](astral-sh/uv#11111))
-   Suggest `--build-backend` when `--backend` is passed to `uv init` ([#&#8203;10958](astral-sh/uv#10958))
-   Improve retry trace message ([#&#8203;11108](astral-sh/uv#11108))

##### Performance

-   Remove unnecessary UTF-8 conversion in hash parsing ([#&#8203;11110](astral-sh/uv#11110))

##### Bug fixes

-   Ignore non-hash fragments in HTML API responses ([#&#8203;11107](astral-sh/uv#11107))
-   Avoid resolving symbolic links when querying Python interpreters ([#&#8203;11083](astral-sh/uv#11083))
-   Avoid sharing state between universal and non-universal resolves ([#&#8203;11051](astral-sh/uv#11051))
-   Error when `--script` is passing a non-PEP 723 script ([#&#8203;11118](astral-sh/uv#11118))
-   Make metadata deserialization failures non-fatal in the cache ([#&#8203;11105](astral-sh/uv#11105))
-   Mark metadata as dynamic when reading from built wheel cache ([#&#8203;11046](astral-sh/uv#11046))
-   Propagate credentials for `<index>/simple` to `<index>/...` endpoints ([#&#8203;11074](astral-sh/uv#11074))
-   Fix conflicting extra bug during `uv sync` ([#&#8203;11075](astral-sh/uv#11075))

##### Documentation

-   Add PyTorch XPU instructions to the PyTorch guide ([#&#8203;11109](astral-sh/uv#11109))
-   Add docs for signal handling ([#&#8203;11041](astral-sh/uv#11041))
-   Explain build frontend vs. build backend ([#&#8203;11094](astral-sh/uv#11094))
-   Fix formatting of `RUST_LOG` documentation ([#&#8203;10053](astral-sh/uv#10053))
-   Fix typo in `--no-deps` description ([#&#8203;11073](astral-sh/uv#11073))
-   Reflow CLI documentation comments ([#&#8203;11040](astral-sh/uv#11040))
-   Shorten "Using existing Python versions" nav item so it fits on one line ([#&#8203;11077](astral-sh/uv#11077))
-   Some minor touch-ups to the Python install guide ([#&#8203;11116](astral-sh/uv#11116))
-   Update Dependabot tracking issue link ([#&#8203;11054](astral-sh/uv#11054))
-   Update documentation for running in a container ([#&#8203;11052](astral-sh/uv#11052))
-   Upgrade PyTorch version in documentation ([#&#8203;11114](astral-sh/uv#11114))
-   Use `sys_platform` in lieu of `platform_system` in PyTorch docs ([#&#8203;11113](astral-sh/uv#11113))
-   Use positive (rather than negative) markers in PyTorch examples ([#&#8203;11112](astral-sh/uv#11112))
-   Fix unnecessary backslashes in brackets ([#&#8203;11059](astral-sh/uv#11059))
-   Suggest setting copy link mode in GitLab integration guide ([#&#8203;11067](astral-sh/uv#11067))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xNDMuMCIsInVwZGF0ZWRJblZlciI6IjM5LjE1Ni4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants