-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Download crates in parallel #5161
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
0616db1
to
bf20ec1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some awesome wins, nice!
Depending on the difficulty of blocking the thread while we're waiting on I/O events it may be worthwhile to bring in tokio as a dependency?
Additionally, is there a way to debug locally and confirm that an HTTP/2 connection is being negotiated? I'd just want to make sure that we are indeed configuring curl's own build correctly.
} | ||
loop { | ||
// If critical error occurred in multi interface, directly return. | ||
let mut active = multi.perform()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method doesn't block in libcurl, so cargo will have to otherwise manage blocking, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're right, thanks for catching it. I will bring in tokio-curl
to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokio-curl
has two issues:
- We can't configure the
Multi
. We can probably add a constructor that accepts aMulti
. Easy2
is not supported. As we have to satisfy the 'static bound in tokio, the allocation is probably inevitable. We probably have to type erase and takeEasy2<Box<dyn Handler>>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah we can definitely add a constructor accepting Multi
, I think I hadn't gotten around to it yet!
I also haven't currently thought too hard about Easy2
vs Easy
wrt tokio-curl, but I'd be more than willing to add support for it
@@ -310,6 +365,10 @@ pub fn configure_http_handle(config: &Config, handle: &mut Easy) -> CargoResult< | |||
handle.low_speed_limit(10 /* bytes per second */)?; | |||
handle.low_speed_time(Duration::new(30, 0))?; | |||
handle.useragent(&version().to_string())?; | |||
// Not all cURL builds support HTTP/2, ignore any errors. | |||
if config.get_bool("http.http2")?.map(|x| x.val).unwrap_or(true) { | |||
let _ = handle.http_version(HttpVersion::V2TLS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this perhaps emit a warning if an error happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this option turned on by default, so it could be annoying and confusing to those using old libcurl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point yeah, I was thinking that if you explicitly configured http2 = true
and it wasn't available we may want to warn, but it looks like this only exists for http2 = false
so I think it's ok to not warn here
@@ -1007,7 +1003,7 @@ impl<'a, 'cfg> Context<'a, 'cfg> { | |||
} | |||
|
|||
/// Gets a package for the given package id. | |||
pub fn get_package(&self, id: &PackageId) -> CargoResult<&'a Package> { | |||
pub fn get_packages(&self, id: &[&PackageId]) -> CargoResult<Vec<&'a Package>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this method is pretty interesting. Long ago Cargo had the signature:
pub fn get_package(&self, id: &PackageId) -> &'a Package;
but we then changed to lazily download packages in order to close #2394. That was the easiest fix to do at the time but I think is causing issues with this patch.
I think to get the most benefit out of the scheme (downloading in parallel) we'll want to probably implement a separate pass over Context
to collect a list of packages that we're gonna want to download. Perhaps that could be deferred to a future PR, though? (just want to make sure we don't lose track of this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm willing to do some refactor, thanks for the reference!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I've been thinking recently that the end-state here may be to "future-ify" everything where we just create a gob of futures and throw them at tokio to do the whole build, although that's certainly not a refactoring required for this PR!
☔ The latest upstream changes (presumably #5152) made this pull request unmergeable. Please resolve the merge conflicts. |
continue; | ||
} | ||
} | ||
let mut dst = self.cache_path.open_rw(path, self.config, &filename)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to deadlock in our case, as we're locking multiple files at once. The solution with least overhead is to have a consistent order on package queue, e.g. using BTreeMap
for PackageSet
.
(Given that PackageSet
may also be refactored out, I'm not sure what the final solution would be though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point! With parallel downloads though there may not be much point in locking per-package, I'd be fine with changing the registry to acquire a global lock when writing (which is append only) and then each crate file only exists if it's actually valid
bf20ec1
to
804d865
Compare
Resolved the spinning loop issue with |
ping @ishitatsuyuki, curious if you've had some extra time to work on this? |
This is not my top priority currently, though I'm working on this periodically. I'm rebasing this now and also figuring out things related to the lazy resolution. |
Ok! Should this PR perhaps be closed in the meantime until it's ready again? |
Download crates in parallel with HTTP/2 This PR revives some of the work of #5161 by refactoring Cargo to make it much easier to add parallel downloads, and then it does so with the `curl` crate's new `http2` feature to compile `nghttp2` has a backend. The primary refactoring done here is to remove the concept of "download this one package" deep within a `Source`. Instead a `Source` still has a `download` method but it's considered to be largely non-blocking. If a crate needs to be downloaded it immediately returns information as to such. The `PackageSet` abstraction is now a central location for all parallel downloads, and all users of it have been refactored to be amenable to parallel downloads, when added. Many more details are in the commits...
Pending issues:
Improvements:
Up to 20x faster initial fetch from Tokyo (high bandwidth, bad ping to US).