-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pin cuda-nvcc as a temporary workaround #389
Conversation
/condalock |
Alternatively, one could use this fix: import os
os.environ["XLA_FLAGS"]="--xla_gpu_force_compilation_parallelism=1" Closing! |
/condalock |
/condalock |
Brings in pangeo-data/pangeo-docker-images#389 Based on pangeo-data/pangeo-docker-images#390, start making T4 the default. Folks can still use K80 if they want. This makes it easier to use CUDA based GPU accelerated code. Follow-up to 2i2c-org#1766
@ngam @dhruvbalwada is this only to support k80? We can just say we no longer support k80s and switch just to T4s. In 2i2c-org/infrastructure#1772 I make T4s the default |
Brings in pangeo-data/pangeo-docker-images#389 Based on pangeo-data/pangeo-docker-images#390, start making T4 the default. Folks can still use K80 if they want. This makes it easier to use CUDA based GPU accelerated code. Follow-up to 2i2c-org#1766
This PR will provide support for T4s only. The problem with K80s was not resolved and requires turning multi-threading off, so maybe just switch over to T4 entirely. |
@dhruvbalwada yeah, 2i2c-org/infrastructure#1772 makes T4s the default - K80s are still available there. We only have quota for 4 T4s, should we get more? |
I don't know the answer to that one yet. I personally probably won't be using more than that (probably will be using only 1 GPU) for the next few months. Maybe if we need more, then we can increase at a later point. Also, based on looking at the logs, do you see any other users reaching the limit often? If so, maybe we can ask them personally if they would like more? Edit: I actually didn't even realize that more than one GPU can be used. How do we access them? |
@yuvipanda, I am obviously not associated with LEAP, M2LINES, etc. so I have no idea what's actually going on behind the scenes. I also don't use the public pangeo-hub ones. However, I am happy to help if needed. Following the conversations in multiple threads, I think an interesting proposition for this org to consider is optimizing the software to specific hardware. This is essentially the problem here. If one could confirm and pin the hardware specs (cuda drivers, highest avx, etc.) then one could either compile pieces of software targeting just that or at least work with conda-forge to ensure proper support is handled correctly. To reiterate though, this is not a "problem" in the large scheme of things, it is just a minor issue in the compute graph compilation (parallel or not) so an even easier and under-wraps solution is to set an environment variable in the images that will suppress the unnecessary error entirely: @dhruvbalwada in your work or in your examples, try working with and without the parallelism evn var and see how things turn out. I suspect there will be no difference at all in terms of speed, but that's just a guess. |
These quotas are for the whole project. It means only 4 people can be using T4s at once on the entire M2LInES hub. We don't support multiple GPUs per user, but could potentially try that if people thought it would be useful. In terms of general usage, we have a bit of chicken vs. egg problem here. We have not done much work to explain to users how to take advantage of these resources. We don't have demos, examples, training, etc. So it is not surprising that usage is low. In the short term, it's fine to drop K80s and use only T4s. I doubt that we would have more than 4 simultaneous GPU users in the project. |
We could definitely try multiple GPUs on the same pod if folks think that would be useful. I've made T4s the default now, but I think the suggestion is that we drop K80s completely. https://cloud.google.com/compute/docs/gpus#nvidia_gpus_for_compute_workloads is the list of GPUs available, let me know if we should make any other ones available. It is fairly trivial to do so! |
I think it is quite difficult to justify using multiple GPUs (cost, compute, etc.) simultaneously (i.e. using nvlink or something). Unless someone showcases a very specific example where it is beneficial, I wouldn't do it. Obviously, someone may make the argument of running completely parallel (independent) jobs (e.g. two different training runs simultaneously), in which case my point doesn't apply.
Obviously A100s are super efficient and powerful, but again, a user must justify the expense. It will likely be cost effective if a user can actually show that they will be able to use an A100 effectively. (Note that's the purpose of my optimizations with XLA, etc. --- to make use of all that A100 offers.) In other words, to run the same thing on T4 and A100, it will actually be cheaper on A100 if all optimizations are being fully utilized. Without naming names, I know someone from your target users who was using many A100s where I have access, and the utilization was 20% and only for a fraction of the allocation time. In that case, obviously, it is a total waste, though the A100s are on the cluster and so it is just longer waiting times for others (and wasted electricity). It takes effort and time to make the code efficient; and so I would put the onus on the user to prove the cost efficiency before wasting resources. In the scientific computing, we are pretty bad about efficient utilization of resources in my experience, so just my 2c on this topic for now. Save your money by making T4 available; if someone can show good use of A100, then make it available for them; it's a total game changer. |
fix #387 (potentially, pending more testing)