-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does DJL support cuda 12.3.1? #2902
Comments
Which DJL version are you using? Please try DJL 0.26.0-SNAPSHOT, it should work for PyTorch 2.1.1 cuda 12. |
PyTorch 2.0.1 or PyTorch 1.13.1 should also work with CUDA 12.3, but you need explicitly set |
0.26 has cu11? |
I can train yolo8 by cuda 12.3.1 + pytorch 2.1.2 in python, but below code still get cpu but not gpu for DJL0.26: Engine engine=Engine.getEngine("PyTorch"); |
Does this enough to use cuda? |
Your You should be able to add the following to use offline native package:
|
oss.sonatype.org has no pytorch-native-cu121-2.1.1-linux-x86_64.jar, but repo.maven.apache.org dose have. But still can't get gpu for below code:
|
I notice you have the following in the log:
This mean the detected os/arch/cuda version doesn't match Can you run the following command in your environment, and see which OS DJL detected:
|
ai.djl.util.Platform : The bundled library: cu121-linux-x86_64:2.1.1-20231129 doesn't match system: cpu-linux-x86_64:2.1.1 What's mean for these? |
I created a PR to address JDK 21 issue: #2903. For the mean time, please use JDK 17.
The above message mean, DJL failed to detect CUDA version. You should see some log related to CudaUtils if you enable debug log:
The following is just warning, which tells you to tune the performance if you need. graph executor optimizer may have negative performance impact for some models.
|
Is there a simple way to do this? |
Can you check your debug log? You should see logs related to:
|
ai.djl.util.cuda.CudaUtils : cudart library not found. Can see these logs when running, but you have ask me to compile to see which OS DJL detected? There are no logs related to CudaUtils when compile, right? there only below logs:
|
means |
You can use the following command to check if libcudart.so can be found:
|
Can you try cuda docker image? |
ai.djl.translate.TranslateException: ai.djl.engine.EngineException: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, original code (most recent call last):
Caused by: ai.djl.engine.EngineException: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, original code (most recent call last):
After reinstall cuda, djl-0.26 can use gpu now. however, above exception thrown, what's the problem? |
Can you try set "mapLocation" to true: https://docs.djl.ai/master/docs/demos/jupyter/load_pytorch_model.html#step-3-load-your-model |
After .optOption("mapLocation", "true"); the device is gpu now, torchscript gpu infer 26 pics spent 320 seconds, but onnxruntime cpu only spent 85 seconds. It seams gpu not really used for torchscript . |
You can try onnxruntime on DJL if you want to (OnnxRuntime doesn't support cuda12 yet). Did you compare the performance on PyTorch (GPU vs CPU)? The performance difference might related to image pre-processing. |
PyTorch gpu spent 320 seconds, PyTorch cpu spent 86 seconds, this is because of issue #2899? |
When will build for merging #2899 ? |
You can try our nightly snapshot release: https://docs.djl.ai/docs/get.html#nightly-snapshots 0.26.0 release will be available around end of Jan. |
https://oss.sonatype.org/content/repositories/snapshots/ This repo has no the update for #2899 |
I install cuda by cuda_12.3.1_545.23.08_linux.run on Centos 9 and PtEngine.getInstance().getDevices(1) return cpu but not gpu. Does DJL support cuda 12.3.1? Why can't get gpu? Thanks.
The text was updated successfully, but these errors were encountered: