Skip to content

Fix persistent data volumes with remote cross. #1072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 12, 2022

Conversation

Alexhuszagh
Copy link
Contributor

@Alexhuszagh Alexhuszagh commented Oct 10, 2022

This fixes copying files into a directory inside a docker data volume, where previously we copied the entire temp directory into the volume itself. This meant that tmp.tmpLkX8M3 would be copied to "${volume}/tmp.tmpLkX8M3", rather than its contents being copied into "${volume}". This also contains numerous other bugs fixes and changes internally, which have been submitted each as separate commits.

This fixes a regression in #1054 to re-enable fingerprinting to work, since previously the fingerprint file name was based off the container name. Since #1054 changes the container names to rely on the system time, this ensures the fingerprints are specific for only the toolchain and the path of directory.

This also simplifies the internals of the docker module substantially. Most free functions have been modified to become methods of structs to simplify their function signatures. Some structs have also been renamed to make it clearer what their purpose is.

Closes #1016.

Convert free functions for working with data volumes to be methods for `DataVolume`.
Renamed `DataVolume` to `ContainerDataVolume`, `Container` to `ChildContainer`, and `DeleteContainer` to `ChildContainerInfo`. Created `DockerVolume` and `DockerContainer` to handle the logic of starting, stopping, creating, etc. docker volumes and containers for simple cases.
Fixes the regression in 1054, since previously the fingerprint file name was based off the volume name. Since the volume names rely on the system time, this ensures the fingerprints are specific for only the toolchain and the path of directory.
This fixes copying files into a directory, where previously we copied the entire temp directory into the volume itself. This meant that `tmp.tmpLkX8M3` would be copied to `"${volume}/tmp.tmpLkX8M3"`, rather than its contents being copied into `"${volume}"`. This also ensures the updated fingerprint is written after all the files have been updated.
@Alexhuszagh Alexhuszagh added bug regression no changelog A valid PR without changelog (no-changelog) labels Oct 10, 2022
@Alexhuszagh
Copy link
Contributor Author

Alexhuszagh commented Oct 10, 2022

I still need a few things to do:

  • Structify the Engine
  • Structify all extensions to Command such as docker_userns.

Copy link
Member

@Emilgardis Emilgardis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great!

@@ -561,7 +564,7 @@ impl QualifiedToolchain {
.file_name()
.expect("should be able to get toolchain name")
.to_utf8()?;
let toolchain_hash = path_hash(self.get_sysroot())?;
let toolchain_hash = path_hash(self.get_sysroot(), 5)?;
Copy link
Member

@Emilgardis Emilgardis Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we make the path len static variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I was going to add a comment on the PR that I made the existing ones all 5, but for the mount identifier, I made it require 10 hexdigits.

For the old hashes, since they either are for the toolchain path (very few actual paths, especially with rustup), a unique ID for the container (which is unique because of the system time, so a small hash is fine), the package path (which is also prefixed by the package name, so small number of possibilities in real cases), 16^5 or ~1million is plenty unique.

But we run into a birthday problem pretty fast (we're using SHA1 so it's all trusted data) if we consider a user mounting many directories or a large number of users mounting a non-trivial number of directories each. If we have 16^5 options, asuming everything is perfectly distributed, that means we have a 50% chance of collision at ~1200 directories. If we up that to 10 hexdigits, then we have no issue.

We shouldn't modify the toolchain ID, since it would mean existing data volumes would no longer be used, and there's no reason to modify them. But for arbitrary volume mounts, it's probably a good idea to ensure there's no accidental collision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the following global const variables, so it's clear why we're using each in a given scenario.

/// Short hash for identifiers with minimal risk of collision.
pub const PATH_HASH_SHORT: usize = 5;

/// Longer hash to minimize risk of random collisions
/// Collision chance is ~10^-6
pub const PATH_HASH_UNIQUE: usize = 10;

@Alexhuszagh Alexhuszagh marked this pull request as ready for review October 11, 2022 20:28
@Alexhuszagh Alexhuszagh requested a review from a team as a code owner October 11, 2022 20:28
Copy link
Member

@Emilgardis Emilgardis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

bors r+

@bors
Copy link
Contributor

bors bot commented Oct 12, 2022

Build succeeded:

@bors bors bot merged commit dd760cf into cross-rs:main Oct 12, 2022
@Alexhuszagh Alexhuszagh deleted the remote_fixes branch October 12, 2022 14:58
@Emilgardis Emilgardis added this to the v0.3.0 milestone Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug no changelog A valid PR without changelog (no-changelog) regression
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CROSS_REMOTE=1 with persistent volume doesn't change project files
2 participants