-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support parallel maps (tmap, pmap, vmap, etc.) #728
Conversation
Note that I would very much understand not wanting the tmap here. It's really to show how it would be done. That's pulled from https://github.com/jw3126/ThreadingTools.jl which isn't registered |
I am not sure adding all the utilities here is such a good idea, it bloats up the package and is arguably the domain of a different package (ideally Base) that we can support in terms of writing adjoints etc. But I get the point of having this. |
yeah, clearly |
The issue with the maps is that, in some sense, it should be using multiple dispatch, but it doesn't. So the adjoints for all of these different maps is literally the same piece of code which makes it easier to do all in one loop. What I am thinking might be a good quick solution is:
|
Note that LoopVectorization also has But I wonder if instead of Threading the maps and threading the matrix multiplications, we should divide batches up by threads? |
Until we figure out the best way forward, let's merge this with just |
This reverts commit 2ee2838.
Added some tests now too. I really just wanted to fix the fact that "can Zygote differentiate parallel code?" had a solution that could be done in about 10 minutes from a cellphone screen during a call, so there's really no reason to keep waiting. This'll work on the 90% of codes which happen to be data-parallel, and we can get a
The nicer solution is JuliaLang/julia#36579 (comment), but that doesn't exist yet, so I think it's best to just get something working until data-parallelism in Julia itself gets more organized, and then users will naturally switch to that along with Zygote. |
bors r+ |
728: support parallel maps (tmap, pmap, vmap, etc.) r=CarloLucibello a=ChrisRackauckas This PR is quite simple and does it. It would be a shame if we had to keep waiting to support data parallelism just because we can't figure out where to put the code, so I hope this PR can drive the conversation. Co-authored-by: Chris Rackauckas <accounts@chrisrackauckas.com>
Build failed: |
Hmm. @chriselrod is the LoopVectorization stack blocking something in LLVM.jl that CUDA.jl needs? |
VectorizationBase is currently set to require 1.5, 1.6, 1.7, or 2.0. I'll see if I can set it to |
CUDA.jl has compat |
Quick fix would be to wrap it in |
How would that fix it? Seems like that just wouldn't test it? |
CUDA 1.1 has dropped support for Julia 1.3, but only 1.1 is compatible with LLVM 2.0 (which is required explicitly by the build here -- not sure why). |
bors try |
tryBuild failed: |
bors try |
It needs to depend on and |
tryBuild failed: |
I'll generalize the vmap method and release a new LoopVectorization version shortly. EDIT: 0.8.15 has merged. It uses a SIMD version when the arguments are all |
Tests are passing locally with the latest LoopVectorization 🎉 |
bors try |
🔒 Permission denied Existing reviewers: click here to make ChrisRackauckas a reviewer |
:( |
bors try |
tryBuild succeeded: |
Bors ran on v1.4 (https://gitlab.com/JuliaGPU/Zygote-jl/-/jobs/632198576#L50) but I forgot to change the name of the job so it said v1.3. So that last commit is just fixing the name but it should be good to go. |
still not clear what is forcing LLVM to 2.0 on julia 1.3. I don't think we had this issue before this PR |
actually the issue is already on master, see #729 bors r+ |
Build succeeded: |
733: add back julia 1.3 compatibility r=CarloLucibello a=CarloLucibello unnecessarily dropped in #728 Having a separate test/Project.toml as specified in the [Pkg docs](https://julialang.github.io/Pkg.jl/v1/creating-packages/#Test-specific-dependencies-in-Julia-1.2-and-above-1) solves the problem of forcing the same Manifest.toml created from the root Project.toml on each testing environment, which may lead to incompatibilities with some extra packages required in the test phase. In this case in fact, LLVM,jl 2.0 was installed on julia 1.3, but then no CUDA.jl version could be resolved Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
842: Remove LoopVectorization.jl r=CarloLucibello a=mcabbott Zygote depends on LoopVectorization only in order to add one gradient definition, for `vmap`. That doesn't seem like a good idea. Surely the rule can live elsewhere, via ZygoteRules ~~or ChainRulesCore~~ which exist for this purpose. Seems to have been added along with `pmap` in #728. Marked as draft here until such a rule exists elsewhere. (It's possible that, after that, Zygote should keep the dep. & not load it, for a while, just all Pkg to bound versions.) The rule might also be simplified, noted here #838 (comment), as I don't think `vmap` can guarantee that the elements will be acted on in order. ~~Since `pmap` also can't guarantee to work in order, it also seems to have no need to reverse the order on the reverse pass.~~ Co-authored-by: Michael Abbott <me@escbook>
842: Remove LoopVectorization.jl r=CarloLucibello a=mcabbott Zygote depends on LoopVectorization only in order to add one gradient definition, for `vmap`. That doesn't seem like a good idea. Surely the rule can live elsewhere, via ZygoteRules ~~or ChainRulesCore~~ which exist for this purpose. Seems to have been added along with `pmap` in #728. Marked as draft here until such a rule exists elsewhere. (It's possible that, after that, Zygote should keep the dep. & not load it, for a while, just all Pkg to bound versions.) The rule might also be simplified, noted here #838 (comment), as I don't think `vmap` can guarantee that the elements will be acted on in order. ~~Since `pmap` also can't guarantee to work in order, it also seems to have no need to reverse the order on the reverse pass.~~ Co-authored-by: Michael Abbott <me@escbook>
This PR is quite simple and does it. It would be a shame if we had to keep waiting to support data parallelism just because we can't figure out where to put the code, so I hope this PR can drive the conversation.