Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple GPU backends #1566

Closed
wants to merge 11 commits into from

Conversation

jpsamaroo
Copy link
Contributor

Abstracts out some functionality to support an external support package, like https://github.com/jpsamaroo/FluxAMDGPU.jl. Better than just adding AMDGPU as a direct dependency, which would just keep bloating Flux load time and makes it harder to iterate quickly for an arguably experimental backend.

Once this is complete, AMD GPU owner should be able to do using Flux, FluxAMDGPU, AMDGPU and get access to basically everything they need to do Flux things.

PR Checklist

  • All/most important FluxAMDGPU.jl tests pass
  • Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable
  • API changes require approval from a committer (different from the author, if applicable)

@DhairyaLGandhi
Copy link
Member

Thanks @jpsamaroo! Looking forward to this!

@DhairyaLGandhi
Copy link
Member

We should think whether we want users to have to deal with glue overloading like in the OP. Seems like we should inherit the type from the call to default_gpu_converter or whatever rather than https://github.com/jpsamaroo/FluxAMDGPU.jl/blob/d365ff3635e8e4fc3dc1cc9fa3b51b75f080ad4b/src/FluxAMDGPU.jl#L10

I'm thinking we might want to obviate the need for that particular package, but if we need it for other more complex handling, then that's a separate discussion.

@maleadt
Copy link
Collaborator

maleadt commented Apr 9, 2021

Can't we do something Preferences.jl-based nowadays instead of creating additional packages? That wouldn't bloat load time without complicating development, CI, etc.

@jpsamaroo
Copy link
Contributor Author

That would still require a Project.toml dependency, which can lead to resolver errors for mismatched dependencies (because I don't develop AMDGPU.jl at nearly the pace you develop CUDA.jl 😄).

@maleadt
Copy link
Collaborator

maleadt commented Apr 29, 2021

Took the liberty to continue some work on this: Rebased, and moved all CUDA functionality to https://github.com/FluxML/FluxCUDA.jl. I did add a dep on GPUArrays so that vendor-neutral GPU functionality can be moved back from FluxCUDA.jl using the AbstractGPUArray supertype.

@maleadt maleadt force-pushed the jps/many-gpu-backends branch 2 times, most recently from b132eff to 898470c Compare April 29, 2021 12:19
@maleadt
Copy link
Collaborator

maleadt commented Apr 29, 2021

The last two commits illustrate the two main approaches towards where we should host this code: in a separate repo, or as a subpackage. I'm leaning towards the latter: the CI recipe isn't really more complicated, it's easy enough for aspiring contributors to dev the GPU back-end package (JuliaLang/Pkg.jl#1251), and locally it works fine too:

(@v1.6) pkg> st
      Status `~/Julia/depot/environments/v1.6/Project.toml`
  [587475ba] Flux v0.12.3 `~/Julia/pkg/Flux`
  [899ac4cc] FluxCUDA v0.1.0 `~/Julia/pkg/Flux/lib/FluxCUDA`

@maleadt maleadt force-pushed the jps/many-gpu-backends branch 3 times, most recently from 71bd38d to de96b36 Compare April 29, 2021 14:17
@maleadt maleadt force-pushed the jps/many-gpu-backends branch 4 times, most recently from ad1d70c to d9318bb Compare May 4, 2021 06:48
@maleadt
Copy link
Collaborator

maleadt commented May 11, 2021

Any thoughts here? This seems to work fine using subpackages.

@maleadt maleadt marked this pull request as ready for review May 11, 2021 16:30
@maleadt
Copy link
Collaborator

maleadt commented May 12, 2021

The refactor here should probably also tackle #1597.

@maleadt
Copy link
Collaborator

maleadt commented May 13, 2021

Recap of a discussion with @DhairyaLGandhi, who had two concerns about this: (1) complexity of development with the GPU specific stuff moved into subpackages, and (2) a degraded default experience now that the CUDA functionality isn't available by default again. I think we can help (1) by sufficiently documenting the development workflow, and FWIW it didn't turn out difficult to dev a subpackage on 1.6. For (2), we could have the gpu function @warn upon first use about having to load a GPU back-end. Currently, calling gpu without a functional CUDA set-up falls back to using the GPU, but I think that's a risky and it should probably just error.

@ToucheSir
Copy link
Member

One edge case this might address is users who have unsupported GPUs and encounter CUDA.jl errors because of it. Granted that only takes an environment variable tweak to address, but it would be nice to have it work as expected OOTB.

@maleadt
Copy link
Collaborator

maleadt commented May 13, 2021

One edge case this might address is users who have unsupported GPUs and encounter CUDA.jl errors because of it. Granted that only takes an environment variable tweak to address, but it would be nice to have it work as expected OOTB.

Idea: FluxCUDA's __init__ could check if CUDA is functional and only conditionally set the default GPU converter to cu, so those users would only see an error ("your GPU is unsupported") upon importing CUDA's Flux back-end.

@ViralBShah
Copy link
Member

Now that we have Metal.jl, OneAPI.jl, and ROCM.jl support, it would be nice to revisit this.

@ToucheSir
Copy link
Member

I think at this point, we can push almost all GPU backend code into NNlib subpackages instead of creating Flux subpackages. That's the easy part though, because there remain two main blockers for making Flux actually useful for non-CUDA backends:

  1. conv* op support: currently, only CUDA.jl has this through bindings to cuDNN. AMDGPU may follow soon thanks to @pxl-th's work, but we're still missing oneDNN and MPS support for oneAPI.jl and Metal.jl respectively.
  2. I haven't seen a good example of how to conditionally load GPU backends outside of the weak deps mechanism proposed in Support in code loading and precompilation for weak dependencies JuliaLang/julia#47040. https://discourse.julialang.org/t/categoricalarrays-forces-openspecfun-jll-recompilation/87759/18 was suggested as a way to do this with Requires.jl (while avoiding the known pitfalls of that library), but later in that thread it seemed like nobody had actually tried this approach in practice?

@ViralBShah
Copy link
Member

ViralBShah commented Nov 6, 2022

@KristofferC What's the best way forward for now to conditionally load packages?

@pxl-th
Copy link
Member

pxl-th commented Nov 9, 2022

AMDGPU may follow soon

There's also https://github.com/JuliaNeuralGraphics/NNlibROC.jl with initial support for some things.


While at it, I also want to suggest the idea of using KernelAbstractions.jl for handwritten kernels at some point.
For example, different sampling kernels can be easily adapted to this and new backends would get that functionality "for free" (KA currently supports CUDA, ROC, OneAPI).

@ViralBShah
Copy link
Member

I have opened a KA issue for Metal support.

JuliaGPU/KernelAbstractions.jl#326

@ToucheSir
Copy link
Member

It would be great to be able to use KA in NNlib. The biggest blocker right now is that it only supports Julia 1.7+, whereas FluxML supports the 1.6 LTS. Not sure how to square this circle.

@avik-pal
Copy link
Member

avik-pal commented Nov 9, 2022

I "kind of" worked around that by allowing KA 0.7 (which supports 1.6) and manually adding compat functions. See https://github.com/avik-pal/Lux.jl/blob/main/lib/LuxLib/src/utils.jl#L4-L16

@ToucheSir
Copy link
Member

ToucheSir commented Nov 9, 2022

KA 0.7 doesn't work for existing NNlibCUDA kernels AIUI because many of them rely on atomics, and those were only added in v0.8. With big changes on the horizon for 0.9, I'd also be wary of targeting such an outdated KA version as well. If it were possible to make KA 0.9/1.0 work on 1.6 again but just with degraded performance and/or a few non-vital QoL features disabled, that might be a path forward.

@jpsamaroo
Copy link
Contributor Author

Why not put 1.6 support on a different minor version branch, and do development on head with Julia 1.7? Julia 1.6 also has problems with using multiple GPU backends at once due to limited method overlay support, making it a bad choice of version to support for GPU-supporting packages.

@ViralBShah
Copy link
Member

Great point!

@ToucheSir
Copy link
Member

ToucheSir commented Nov 14, 2022

Interesting, is there an example of other packages which do this? With this approach, is there a way to avoid maintaining 2 copies of each kernel (1 legacy for 1.6, 1 KA for 1.7+) for any kernels we want to move to KA? My main worry is that we're barely able to maintain a single copy of stuff like the conv_* routines in NNlib as-is (e.g. see how long it took to fix FluxML/NNlib.jl#441), but maybe switching to KA would offset this by making contributing significantly more accessible?

@jpsamaroo
Copy link
Contributor Author

I believe CUDA did this at some point, but I'm not aware of any packages actively doing this (I think most packages which need advanced atomics just gave up on 1.6).

My thought is that, while 1.6 is a supported upstream version, it doesn't mean that NNlib needs to add new features for 1.6 users; only bug fixes are expected for the 1.6 LTS anyway. So NNlib can just do the same and only provide bugfixes on that 1.6-supporting branch.

And my personal opinion: Holding back NNlib's feature development and not enhancing maintainability just because some people won't (or can't) update to a newer stable Julia version hurts more people than those that it helps.

@ToucheSir
Copy link
Member

ToucheSir commented Nov 15, 2022

That sounds reasonable enough to me, thanks.

And my personal opinion: Holding back NNlib's feature development and not enhancing maintainability just because some people won't (or can't) update to a newer stable Julia version hurts more people than those that it helps.

For better or worse, the main thing holding back NNlib feature development (IMO) is an acute lack of people with the know-how and interest to work on optimized low-level routines. So if there's a real possibility this could help with that, then I'm on board. cc @FluxML/committers though for additional thoughts on this.

@Moelf
Copy link
Contributor

Moelf commented Feb 13, 2023

bump now that oneAPI has hit 1.0

@CarloLucibello
Copy link
Member

Closing this as we now have AMDGPU support as an extension, and CUDA and Metal extensions are on their way as well (#2268 and #2270)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants