Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make bazel download files #146

Closed
shs96c opened this issue May 16, 2019 · 8 comments · Fixed by #169
Closed

Make bazel download files #146

shs96c opened this issue May 16, 2019 · 8 comments · Fixed by #169
Assignees

Comments

@shs96c
Copy link
Collaborator

shs96c commented May 16, 2019

Coursier optimistically downloads every single jar file required when starting a build. When the number of dependencies is large, this can cause a significant delay before anything starts being built, particularly since jar files are significantly larger than pom files (on the whole)

If, instead, maven_install output an http_file for each jar or source jar required, builds could start a lot sooner and we could ensure that all downloads happen through Bazel's own mechanisms.

There doesn't appear to be a command line flag to make this happen automatically, but once we have resolved all the dependencies, the URLs to download from tend to be a simple transformation of the maven coordinates.

@jin
Copy link
Collaborator

jin commented May 17, 2019

http_file will create one new external repository for every artifact. This is a little different from the single repository model. I think what we want is coursier to resolve URLs for us, and send this structure of URLs into repository_ctx.download. repository_ctx.download downloads the files into the current external repository context and it works with repository_cache, AFAIUC.

@aehlig, can you confirm?

@aehlig
Copy link

aehlig commented May 17, 2019

@aehlig, can you confirm?

Confirm what?

As far as I can see, there are two issues raised here.

  • Not downloading by an external tool, and instead make bazel do it, thus using cache, --distdir, etc.

  • Only downloading some of the jar files, if only some targets are built and hence not the full transitive dependency is needed; also starting the build early.

Calling ctx.download would be a solution to the first issue, but not the second, as external repositories are fetched fully before any file in them is looked at. The original proposal, of maven_install generating a .bzl file exporting a function add_maven_repositories that is then called from the main WORKSPACE would solve both. Yes, that would be one external repository per file, but that's the only way to not bundle all the downloads. Note that this doesn't stop maven_install to provide a single repository; it could just have, for each downloaded jar needed, a dependency on the respective http_file.

@Globegitter
Copy link

I have worked with all of the above mentioned approaches and using a single repository with many ctx.download calls actually works reasonably well because one has to download all the deps only the first time, then they will be cached and only changed deps will be downloaded, unless one calls bazel clean - -expunge.

But I do still think downloading deps as independent repositories via http_file does lead to the best overall experience because one gets better progress reporting, one always only loads the needed deps and gets bazel's native parallelism handling.

And yeah as Klaus said one could still expose everything in one repository, I think even just an alias should do the trick.

@Globegitter
Copy link

Globegitter commented Jun 4, 2019

@jin @shs96c Just to see how far I could get in an hour to have an external repository per dependency I hacked something very quick and dirty together that is far from being complete, but already highlights some issue. I just focused on how we could get the list of dependencies and turn it into a list of http_files. See https://github.com/Globegitter/rules_jvm_external/tree/bazel-fetch

It does create a maven.bzl file with a download_deps function of an currently incomplete and duplicated set of http_file dependencies.

The first issue is that the resolve command does not allow a json report, see: coursier/coursier#659. So if that does not get implemented it would probably be best to make use of the API (https://get-coursier.io/docs/api) rather than trying to parse the default output, which is what I am currently doing.

Secondly maven only exposes sha1 digests and bazel only allows sha256 (see bazelbuild/bazel#7323), so the only way I can see of getting this to work right now in pure starlark is by printing the sha of the downloaded file (which I believe bazel is now doing by default anyway) and then users can add the list of sha's manually to the download_deps method. WDYT?

@jin
Copy link
Collaborator

jin commented Jun 4, 2019

I have this PR (#160) that creates a .bzl file with individual local_repository declarations for each artifact. If we can get resolve to return the same JSON output file, it'll be straightforward to make the .bzl contain http_file dependencies. Finally, the top level @maven//:BUILD file aar_import and jvm_import targets can then refer to to these http_file dependencies as file targets.

Re: SHA256, we can extend core Bazel to output the SHA as .bzl file that can be loaded / parsed.

@Globegitter
Copy link

Ah great, I will have a look in the coming weeks. And if bazel could do something like that, that would be great. Actually just thinking about this, I wonder if something like that would already be possible if we where to write our own repository_rule? Anyway, I will investigate when I get around to it.

@jin
Copy link
Collaborator

jin commented Jun 7, 2019

Another complication is authentication. Authentication for repository_ctx.download is in the works. Currently, this is handled by Coursier directly, so we get private Maven repository support for free.

@jin
Copy link
Collaborator

jin commented Jun 17, 2019

Work in progress PR to have Bazel download files, instead of Coursier: #169

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants