Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] Download crates in parallel #5161

Closed
wants to merge 1 commit into from

Conversation

ishitatsuyuki
Copy link
Contributor

@ishitatsuyuki ishitatsuyuki commented Mar 10, 2018

Pending issues:

  • Progress will be readded later with unified style, not one-by-one
  • Retries are still not handled as the current impl is not async ready
  • Config methods are cluttered
  • Using Vec/slices everywhere probably increases allocation and uses less functional paradigms
  • Cargo build isn't completely parallelized as it seems to download in recursive methods (test with cargo fetch)

Improvements:

Up to 20x faster initial fetch from Tokyo (high bandwidth, bad ping to US).

@rust-highfive
Copy link

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some awesome wins, nice!

Depending on the difficulty of blocking the thread while we're waiting on I/O events it may be worthwhile to bring in tokio as a dependency?

Additionally, is there a way to debug locally and confirm that an HTTP/2 connection is being negotiated? I'd just want to make sure that we are indeed configuring curl's own build correctly.

}
loop {
// If critical error occurred in multi interface, directly return.
let mut active = multi.perform()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method doesn't block in libcurl, so cargo will have to otherwise manage blocking, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're right, thanks for catching it. I will bring in tokio-curl to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokio-curl has two issues:

  • We can't configure the Multi. We can probably add a constructor that accepts a Multi.
  • Easy2 is not supported. As we have to satisfy the 'static bound in tokio, the allocation is probably inevitable. We probably have to type erase and take Easy2<Box<dyn Handler>>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah we can definitely add a constructor accepting Multi, I think I hadn't gotten around to it yet!

I also haven't currently thought too hard about Easy2 vs Easy wrt tokio-curl, but I'd be more than willing to add support for it

@@ -310,6 +365,10 @@ pub fn configure_http_handle(config: &Config, handle: &mut Easy) -> CargoResult<
handle.low_speed_limit(10 /* bytes per second */)?;
handle.low_speed_time(Duration::new(30, 0))?;
handle.useragent(&version().to_string())?;
// Not all cURL builds support HTTP/2, ignore any errors.
if config.get_bool("http.http2")?.map(|x| x.val).unwrap_or(true) {
let _ = handle.http_version(HttpVersion::V2TLS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this perhaps emit a warning if an error happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this option turned on by default, so it could be annoying and confusing to those using old libcurl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point yeah, I was thinking that if you explicitly configured http2 = true and it wasn't available we may want to warn, but it looks like this only exists for http2 = false so I think it's ok to not warn here

@@ -1007,7 +1003,7 @@ impl<'a, 'cfg> Context<'a, 'cfg> {
}

/// Gets a package for the given package id.
pub fn get_package(&self, id: &PackageId) -> CargoResult<&'a Package> {
pub fn get_packages(&self, id: &[&PackageId]) -> CargoResult<Vec<&'a Package>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this method is pretty interesting. Long ago Cargo had the signature:

pub fn get_package(&self, id: &PackageId) -> &'a Package;

but we then changed to lazily download packages in order to close #2394. That was the easiest fix to do at the time but I think is causing issues with this patch.

I think to get the most benefit out of the scheme (downloading in parallel) we'll want to probably implement a separate pass over Context to collect a list of packages that we're gonna want to download. Perhaps that could be deferred to a future PR, though? (just want to make sure we don't lose track of this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to do some refactor, thanks for the reference!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I've been thinking recently that the end-state here may be to "future-ify" everything where we just create a gob of futures and throw them at tokio to do the whole build, although that's certainly not a refactoring required for this PR!

@bors
Copy link
Contributor

bors commented Mar 13, 2018

☔ The latest upstream changes (presumably #5152) made this pull request unmergeable. Please resolve the merge conflicts.

continue;
}
}
let mut dst = self.cache_path.open_rw(path, self.config, &filename)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to deadlock in our case, as we're locking multiple files at once. The solution with least overhead is to have a consistent order on package queue, e.g. using BTreeMap for PackageSet.

(Given that PackageSet may also be refactored out, I'm not sure what the final solution would be though.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point! With parallel downloads though there may not be much point in locking per-package, I'd be fine with changing the registry to acquire a global lock when writing (which is append only) and then each crate file only exists if it's actually valid

@ishitatsuyuki
Copy link
Contributor Author

Resolved the spinning loop issue with Multi::wait; I will address others when I have time.

@alexcrichton
Copy link
Member

ping @ishitatsuyuki, curious if you've had some extra time to work on this?

@ishitatsuyuki
Copy link
Contributor Author

This is not my top priority currently, though I'm working on this periodically. I'm rebasing this now and also figuring out things related to the lazy resolution.

@alexcrichton
Copy link
Member

Ok! Should this PR perhaps be closed in the meantime until it's ready again?

bors added a commit that referenced this pull request Sep 18, 2018
Download crates in parallel with HTTP/2

This PR revives some of the work of #5161 by refactoring Cargo to make it much easier to add parallel downloads, and then it does so with the `curl` crate's new `http2` feature to compile `nghttp2` has a backend.

The primary refactoring done here is to remove the concept of "download this one package" deep within a `Source`. Instead a `Source` still has a `download` method but it's considered to be largely non-blocking. If a crate needs to be downloaded it immediately returns information as to such. The `PackageSet` abstraction is now a central location for all parallel downloads, and all users of it have been refactored to be amenable to parallel downloads, when added.

Many more details are in the commits...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants