-
Notifications
You must be signed in to change notification settings - Fork 411
Fix persistent data volumes with remote cross. #1072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Convert free functions for working with data volumes to be methods for `DataVolume`.
Renamed `DataVolume` to `ContainerDataVolume`, `Container` to `ChildContainer`, and `DeleteContainer` to `ChildContainerInfo`. Created `DockerVolume` and `DockerContainer` to handle the logic of starting, stopping, creating, etc. docker volumes and containers for simple cases.
Fixes the regression in 1054, since previously the fingerprint file name was based off the volume name. Since the volume names rely on the system time, this ensures the fingerprints are specific for only the toolchain and the path of directory.
This fixes copying files into a directory, where previously we copied the entire temp directory into the volume itself. This meant that `tmp.tmpLkX8M3` would be copied to `"${volume}/tmp.tmpLkX8M3"`, rather than its contents being copied into `"${volume}"`. This also ensures the updated fingerprint is written after all the files have been updated.
I still need a few things to do:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great!
src/docker/remote.rs
Outdated
@@ -561,7 +564,7 @@ impl QualifiedToolchain { | |||
.file_name() | |||
.expect("should be able to get toolchain name") | |||
.to_utf8()?; | |||
let toolchain_hash = path_hash(self.get_sysroot())?; | |||
let toolchain_hash = path_hash(self.get_sysroot(), 5)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we make the path len static variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I was going to add a comment on the PR that I made the existing ones all 5, but for the mount identifier, I made it require 10 hexdigits.
For the old hashes, since they either are for the toolchain path (very few actual paths, especially with rustup), a unique ID for the container (which is unique because of the system time, so a small hash is fine), the package path (which is also prefixed by the package name, so small number of possibilities in real cases), 16^5 or ~1million is plenty unique.
But we run into a birthday problem pretty fast (we're using SHA1 so it's all trusted data) if we consider a user mounting many directories or a large number of users mounting a non-trivial number of directories each. If we have 16^5 options, asuming everything is perfectly distributed, that means we have a 50% chance of collision at ~1200 directories. If we up that to 10 hexdigits, then we have no issue.
We shouldn't modify the toolchain ID, since it would mean existing data volumes would no longer be used, and there's no reason to modify them. But for arbitrary volume mounts, it's probably a good idea to ensure there's no accidental collision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the following global const variables, so it's clear why we're using each in a given scenario.
/// Short hash for identifiers with minimal risk of collision.
pub const PATH_HASH_SHORT: usize = 5;
/// Longer hash to minimize risk of random collisions
/// Collision chance is ~10^-6
pub const PATH_HASH_UNIQUE: usize = 10;
7ff66a6
to
3a897ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
bors r+
Build succeeded: |
This fixes copying files into a directory inside a docker data volume, where previously we copied the entire temp directory into the volume itself. This meant that
tmp.tmpLkX8M3
would be copied to"${volume}/tmp.tmpLkX8M3"
, rather than its contents being copied into"${volume}"
. This also contains numerous other bugs fixes and changes internally, which have been submitted each as separate commits.This fixes a regression in #1054 to re-enable fingerprinting to work, since previously the fingerprint file name was based off the container name. Since #1054 changes the container names to rely on the system time, this ensures the fingerprints are specific for only the toolchain and the path of directory.
This also simplifies the internals of the docker module substantially. Most free functions have been modified to become methods of structs to simplify their function signatures. Some structs have also been renamed to make it clearer what their purpose is.
Closes #1016.