Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does DJL support cuda 12.3.1? #2902

Closed
SidneyLann opened this issue Dec 23, 2023 · 30 comments
Closed

Does DJL support cuda 12.3.1? #2902

SidneyLann opened this issue Dec 23, 2023 · 30 comments
Labels
enhancement New feature or request

Comments

@SidneyLann
Copy link
Contributor

SidneyLann commented Dec 23, 2023

I install cuda by cuda_12.3.1_545.23.08_linux.run on Centos 9 and PtEngine.getInstance().getDevices(1) return cpu but not gpu. Does DJL support cuda 12.3.1? Why can't get gpu? Thanks.

@SidneyLann SidneyLann added the enhancement New feature or request label Dec 23, 2023
@frankfliu
Copy link
Contributor

frankfliu commented Dec 24, 2023

Which DJL version are you using?

Please try DJL 0.26.0-SNAPSHOT, it should work for PyTorch 2.1.1 cuda 12.
See: https://docs.djl.ai/master/engines/pytorch/pytorch-engine/index.html#supported-pytorch-versions

@frankfliu
Copy link
Contributor

PyTorch 2.0.1 or PyTorch 1.13.1 should also work with CUDA 12.3, but you need explicitly set PYTORCH_FLAVOR=cu118

@SidneyLann
Copy link
Contributor Author

SidneyLann commented Dec 24, 2023

image
No pytorch-native-cu121 in https://oss.sonatype.org/content/repositories/snapshots/ for 0.26.0-SNAPSHOT

image

0.26 has cu11?

@SidneyLann
Copy link
Contributor Author

SidneyLann commented Dec 24, 2023

I can train yolo8 by cuda 12.3.1 + pytorch 2.1.2 in python, but below code still get cpu but not gpu for DJL0.26:

Engine engine=Engine.getEngine("PyTorch");
Device[] devices=engine.getDevices(1);
LOG.debug("devices[0]={}", devices[0].getDeviceType());

@SidneyLann
Copy link
Contributor Author

<properties>
	<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	<maven.compiler.encoding>UTF-8</maven.compiler.encoding>
	<maven.compiler.source>21</maven.compiler.source>
	<maven.compiler.target>21</maven.compiler.target>
	<djl.version>0.26.0-SNAPSHOT</djl.version>
</properties>

<repositories>
	<repository>
		<id>djl.ai</id>
		<url>https://oss.sonatype.org/content/repositories/snapshots/</url>
	</repository>
</repositories>

<dependencyManagement>
	<dependencies>
		<dependency>
			<groupId>ai.djl</groupId>
			<artifactId>bom</artifactId>
			<version>${djl.version}</version>
			<type>pom</type>
			<scope>import</scope>
		</dependency>
	</dependencies>
</dependencyManagement>

<dependencies>
	<dependency>
		<groupId>commons-cli</groupId>
		<artifactId>commons-cli</artifactId>
		<version>1.6.0</version>
	</dependency>
	<dependency>
		<groupId>ai.djl</groupId>
		<artifactId>api</artifactId>
	</dependency>
	<dependency>
		<groupId>ai.djl</groupId>
		<artifactId>model-zoo</artifactId>
	</dependency>
	<dependency>
		<groupId>ai.djl.pytorch</groupId>
		<artifactId>pytorch-model-zoo</artifactId>
	</dependency>
	<dependency>
		<groupId>ai.djl.pytorch</groupId>
		<artifactId>pytorch-engine</artifactId>
		<scope>runtime</scope>
	</dependency>
</dependencies>

Does this enough to use cuda?

@frankfliu
Copy link
Contributor

frankfliu commented Dec 24, 2023

Your pom.xml looks good. It should automatically download pytorch 2.1.1 native library for cuda 12.1.
See: https://github.com/deepjavalibrary/djl/blob/master/examples/pom.xml

You should be able to add the following to use offline native package:

<dependency>
    <groupId>ai.djl.pytorch</groupId>
    <artifactId>pytorch-native-cu121</artifactId>
    <classifier>linux-x86_64</classifier>
    <scope>runtime</scope>
</dependency>
cd examples
mvn clean package -DskipTests
mvn exec:java -Dai.djl.default_engine=PyTorch

Downloading from djl.ai: https://oss.sonatype.org/content/repositories/snapshots/ai/djl/pytorch/pytorch-native-cu121/2.1.1/pytorch-native-cu121-2.1.1.pom
Downloading from central: https://repo.maven.apache.org/maven2/ai/djl/pytorch/pytorch-native-cu121/2.1.1/pytorch-native-cu121-2.1.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/ai/djl/pytorch/pytorch-native-cu121/2.1.1/pytorch-native-cu121-2.1.1.pom (1.3 kB at 8.0 kB/s)
Downloading from djl.ai: https://oss.sonatype.org/content/repositories/snapshots/ai/djl/pytorch/pytorch-native-cu121/2.1.1/pytorch-native-cu121-2.1.1-linux-x86_64.jar
Downloading from central: https://repo.maven.apache.org/maven2/ai/djl/pytorch/pytorch-native-cu121/2.1.1/pytorch-native-cu121-2.1.1-linux-x86_64.jar

@SidneyLann
Copy link
Contributor Author

oss.sonatype.org has no pytorch-native-cu121-2.1.1-linux-x86_64.jar, but repo.maven.apache.org dose have. But still can't get gpu for below code:

  String modelFile = AppConfig.HOME_ART + "/model/yolo8x.torchscript";
  YoloV8TranslatorFactory yoloV8TranslatorFactory = new YoloV8TranslatorFactory();
  Engine engine=Engine.getEngine("PyTorch");
  Device[] devices=engine.getDevices(1);
  LOG.debug("devices[0]={}", devices[0].getDeviceType());
  Map<String, Object> arguments = new HashMap<>();
  arguments.put("width", Integer.valueOf(640));
  arguments.put("height", Integer.valueOf(640));
  arguments.put("resize", "true");
  arguments.put("toTensor", true);
  arguments.put("applyRatio", true);
  arguments.put("threshold", threshold);

  Translator<Image, DetectedObjects> translator = yoloV8TranslatorFactory.newInstance(Image.class, DetectedObjects.class, null, arguments);

  Criteria<Image, DetectedObjects> criteria = Criteria.builder().setTypes(Image.class, DetectedObjects.class).optModelPath(Paths.get(modelFile)).optEngine("PyTorch").optTranslator(translator)
      .optProgress(new ProgressBar()).build();

  ZooModel<Image, DetectedObjects> model = criteria.loadModel();
  Predictor<Image, DetectedObjects> predictor = model.newPredictor();

4
image

@frankfliu
Copy link
Contributor

I notice you have the following in the log:

Ignore mismatching platform from: jar:file...

This mean the detected os/arch/cuda version doesn't match cu121-linux-x86_64

Can you run the following command in your environment, and see which OS DJL detected:

git clone https://github.com/deepjavalibrary/djl.git
cd djl
./gradlew debugEnv -Dai.djl.default_engine=PyTorch

@SidneyLann
Copy link
Contributor Author

SidneyLann commented Dec 26, 2023

image
Can't build for centos9+jdk21

@SidneyLann
Copy link
Contributor Author

ai.djl.util.Platform : The bundled library: cu121-linux-x86_64:2.1.1-20231129 doesn't match system: cpu-linux-x86_64:2.1.1
ai.djl.util.Platform : Ignore mismatching platform from: jar:file:/home/sidney/prg/tomcat/webapps/pcng-of-srv-resource/WEB-INF/lib/pytorch-native-cu121-2.1.1-linux-x86_64.jar!/native/lib/pytorch.properties
ai.djl.pytorch.engine.PtEngine : PyTorch graph executor optimizer is enabled, this may impact your inference latency and throughput. See: https://docs.djl.ai/docs/development/inference_performance_optimization.html#graph-executor-optimization

What's mean for these?

@frankfliu
Copy link
Contributor

I created a PR to address JDK 21 issue: #2903. For the mean time, please use JDK 17.

ai.djl.util.Platform : The bundled library: cu121-linux-x86_64:2.1.1-20231129 doesn't match system: cpu-linux-x86_64:2.1.1

The above message mean, DJL failed to detect CUDA version. You should see some log related to CudaUtils if you enable debug log:

[DEBUG] - cudart library not found.

The following is just warning, which tells you to tune the performance if you need. graph executor optimizer may have negative performance impact for some models.

PyTorch graph executor optimizer is enabled

@SidneyLann
Copy link
Contributor Author

image
For jdk17, block in here for 2 hours now, I have run several times, always block here, no cpu time is using, is it normal?

@SidneyLann
Copy link
Contributor Author

Can you run the following command in your environment, and see which OS DJL detected:

Is there a simple way to do this?

@frankfliu
Copy link
Contributor

Can you check your debug log?

You should see logs related to:

[DEBUG] ai.djl.util.CudaUtils - ...

@SidneyLann
Copy link
Contributor Author

ai.djl.util.cuda.CudaUtils : cudart library not found.
ai.djl.util.Platform : The bundled library: cu121-linux-x86_64:2.1.1-20231129 doesn't match system: cpu-linux-x86_64:2.1.1
ai.djl.util.Platform : Ignore mismatching platform from: jar:file:/home/sidney/prg/tomcat/webapps/pcng-of-srv-resource/WEB-INF/lib/pytorch-native-cu121-2.1.1-linux-x86_64.jar!/native/lib/pytorch.properties
ai.djl.pytorch.jni.LibUtils : Using cache dir: /home/sidney/.djl.ai/pytorch/2.1.1-cpu-linux-x86_64
ai.djl.pytorch.jni.LibUtils : Loading native library: /home/sidney/.djl.ai/pytorch/2.1.1-cpu-linux-x86_64/libc10.so
ai.djl.pytorch.jni.LibUtils : Loading native library: /home/sidney/.djl.ai/pytorch/2.1.1-cpu-linux-x86_64/libgomp-52f2fd74.so.1
ai.djl.pytorch.jni.LibUtils : Loading native library: /home/sidney/.djl.ai/pytorch/2.1.1-cpu-linux-x86_64/libtorch_cpu.so
ai.djl.pytorch.jni.LibUtils : Loading native library: /home/sidney/.djl.ai/pytorch/2.1.1-cpu-linux-x86_64/libtorch.so
ai.djl.pytorch.jni.LibUtils : Loading native library: /home/sidney/.djl.ai/pytorch/2.1.1-cpu-linux-x86_64/0.26.0-libdjl_torch.so

Can see these logs when running, but you have ask me to compile to see which OS DJL detected? There are no logs related to CudaUtils when compile, right? there only below logs:

Task :engines:mxnet:jnarator:classes UP-TO-DATE
Task :engines:mxnet:jnarator:jar UP-TO-DATE
Task :engines:mxnet:mxnet-engine:jnarator UP-TO-DATE
Task :engines:mxnet:mxnet-engine:compileJava UP-TO-DATE
Task :engines:mxnet:mxnet-engine:processResources UP-TO-DATE
Task :engines:mxnet:mxnet-engine:classes UP-TO-DATE
Task :engines:mxnet:mxnet-engine:jar UP-TO-DATE
Task :engines:mxnet:mxnet-model-zoo:compileJava UP-TO-DATE
Task :engines:mxnet:mxnet-model-zoo:processResources UP-TO-DATE
Task :engines:mxnet:mxnet-model-zoo:classes UP-TO-DATE
Task :engines:mxnet:mxnet-model-zoo:jar UP-TO-DATE
Task :engines:onnxruntime:onnxruntime-engine:compileJava

@frankfliu
Copy link
Contributor

ai.djl.util.cuda.CudaUtils : cudart library not found.

means libcudart.so is not in LD_LIBRARY_PATH

@frankfliu
Copy link
Contributor

You can use the following command to check if libcudart.so can be found:

ldconfig -p | grep libcudart.so

@SidneyLann
Copy link
Contributor Author

image
image

Don't know why cuda_12.3.1_545.23.08_linux.run not install the libs and why python can use cuda? mayb pip install the cuda.

@SidneyLann
Copy link
Contributor Author

image
image
Amended .bash_profile and reboot OS, still:
CudaUtils : cudart library not found.

@SidneyLann
Copy link
Contributor Author

SidneyLann commented Dec 27, 2023

image

Why ldconfig show libcudart.so in this folder but the folder does not have it?!

image

and can't find libcudart.so

@frankfliu
Copy link
Contributor

Can you try cuda docker image?

@SidneyLann
Copy link
Contributor Author

ai.djl.translate.TranslateException: ai.djl.engine.EngineException: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/ultralytics/nn/tasks.py", line 74, in forward
_35 = (_18).forward(act, _34, )
_36 = (_20).forward((_19).forward(act, _35, ), _29, )
_37 = (_22).forward(_33, _35, (_21).forward(act, _36, ), )
~~~~~~~~~~~~ <--- HERE
return _37
File "code/torch/ultralytics/nn/modules/head.py", line 46, in forward
anchor_points = torch.unsqueeze(CONSTANTS.c0, 0)
lt, rb, = torch.chunk(_14, 2, 1)
x1y1 = torch.sub(anchor_points, lt)
~~~~~~~~~ <--- HERE
x2y2 = torch.add(anchor_points, rb)
c_xy = torch.div(torch.add(x1y1, x2y2), CONSTANTS.c1)

Traceback of TorchScript, original code (most recent call last):
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/utils/tal.py(267): dist2bbox
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/nn/modules/head.py(59): forward
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/nn/tasks.py(81): _predict_once
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/nn/tasks.py(60): predict
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/nn/tasks.py(42): forward
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/usr/prg/python/3102/lib/python3.10/site-packages/torch/jit/_trace.py(1065): trace_module
/usr/prg/python/3102/lib/python3.10/site-packages/torch/jit/_trace.py(798): trace
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/engine/exporter.py(302): export_torchscript
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/engine/exporter.py(117): outer_func
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/engine/exporter.py(252): call
/usr/prg/python/3102/lib/python3.10/site-packages/torch/utils/_contextlib.py(115): decorate_context
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/engine/model.py(328): export
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/cfg/init.py(448): entrypoint
/home/sidney/.local/bin/yolo(8):
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

at ai.djl.inference.Predictor.batchPredict(Predictor.java:192) ~[api-0.26.0-SNAPSHOT.jar:na]
at ai.djl.inference.Predictor.predict(Predictor.java:129) ~[api-0.26.0-SNAPSHOT.jar:na]
at com.pcng.resource.service.ArtDetectionService.detect(ArtDetectionService.java:111) ~[classes/:na]
at com.pcng.resource.service.PianoDetectionService.detect(PianoDetectionService.java:31) ~[classes/:na]
at com.pcng.resource.controller.ArtDetectionController.detect(ArtDetectionController.java:45) ~[classes/:na]
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na]
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205) ~[spring-web-6.0.12.jar:6.0.12]
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:150) ~[spring-web-6.0.12.jar:6.0.12]
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118) ~[spring-webmvc-6.0.12.jar:6.0.12]
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:884) ~[spring-webmvc-6.0.12.jar:6.0.12]
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797) ~[spring-webmvc-6.0.12.jar:6.0.12]
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) ~[spring-webmvc-6.0.12.jar:6.0.12]
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1081) ~[spring-webmvc-6.0.12.jar:6.0.12]
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:974) ~[spring-webmvc-6.0.12.jar:6.0.12]
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1011) ~[spring-webmvc-6.0.12.jar:6.0.12]
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914) ~[spring-webmvc-6.0.12.jar:6.0.12]
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:590) ~[servlet-api.jar:6.0]
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885) ~[spring-webmvc-6.0.12.jar:6.0.12]
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658) ~[servlet-api.jar:6.0]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:205) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) ~[catalina.jar:10.1.13]
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) ~[tomcat-websocket.jar:10.1.13]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) ~[catalina.jar:10.1.13]
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) ~[spring-web-6.0.12.jar:6.0.12]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) ~[spring-web-6.0.12.jar:6.0.12]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) ~[catalina.jar:10.1.13]
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) ~[spring-web-6.0.12.jar:6.0.12]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) ~[spring-web-6.0.12.jar:6.0.12]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) ~[catalina.jar:10.1.13]
at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109) ~[spring-web-6.0.12.jar:6.0.12]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) ~[spring-web-6.0.12.jar:6.0.12]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) ~[catalina.jar:10.1.13]
at org.springframework.boot.web.servlet.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:124) ~[spring-boot-3.1.4.jar:3.1.4]
at org.springframework.boot.web.servlet.support.ErrorPageFilter$1.doFilterInternal(ErrorPageFilter.java:99) ~[spring-boot-3.1.4.jar:3.1.4]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) ~[spring-web-6.0.12.jar:6.0.12]
at org.springframework.boot.web.servlet.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:117) ~[spring-boot-3.1.4.jar:3.1.4]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) ~[catalina.jar:10.1.13]
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) ~[spring-web-6.0.12.jar:6.0.12]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) ~[spring-web-6.0.12.jar:6.0.12]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90) ~[catalina.jar:10.1.13]
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115) ~[catalina.jar:10.1.13]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93) ~[catalina.jar:10.1.13]
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:673) ~[catalina.jar:10.1.13]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) ~[catalina.jar:10.1.13]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:341) ~[catalina.jar:10.1.13]
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391) ~[tomcat-coyote.jar:10.1.13]
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63) ~[tomcat-coyote.jar:10.1.13]
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:894) ~[tomcat-coyote.jar:10.1.13]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1740) ~[tomcat-coyote.jar:10.1.13]
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) ~[tomcat-coyote.jar:10.1.13]
at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) ~[tomcat-util.jar:10.1.13]
at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) ~[tomcat-util.jar:10.1.13]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-util.jar:10.1.13]
at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]

Caused by: ai.djl.engine.EngineException: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/ultralytics/nn/tasks.py", line 74, in forward
_35 = (_18).forward(act, _34, )
_36 = (_20).forward((_19).forward(act, _35, ), _29, )
_37 = (_22).forward(_33, _35, (_21).forward(act, _36, ), )
~~~~~~~~~~~~ <--- HERE
return _37
File "code/torch/ultralytics/nn/modules/head.py", line 46, in forward
anchor_points = torch.unsqueeze(CONSTANTS.c0, 0)
lt, rb, = torch.chunk(_14, 2, 1)
x1y1 = torch.sub(anchor_points, lt)
~~~~~~~~~ <--- HERE
x2y2 = torch.add(anchor_points, rb)
c_xy = torch.div(torch.add(x1y1, x2y2), CONSTANTS.c1)

Traceback of TorchScript, original code (most recent call last):
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/utils/tal.py(267): dist2bbox
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/nn/modules/head.py(59): forward
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/nn/tasks.py(81): _predict_once
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/nn/tasks.py(60): predict
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/nn/tasks.py(42): forward
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/prg/python/3102/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/usr/prg/python/3102/lib/python3.10/site-packages/torch/jit/_trace.py(1065): trace_module
/usr/prg/python/3102/lib/python3.10/site-packages/torch/jit/_trace.py(798): trace
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/engine/exporter.py(302): export_torchscript
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/engine/exporter.py(117): outer_func
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/engine/exporter.py(252): call
/usr/prg/python/3102/lib/python3.10/site-packages/torch/utils/_contextlib.py(115): decorate_context
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/engine/model.py(328): export
/usr/prg/python/3102/lib/python3.10/site-packages/ultralytics/cfg/init.py(448): entrypoint
/home/sidney/.local/bin/yolo(8):
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

at ai.djl.pytorch.jni.PyTorchLibrary.moduleRunMethod(Native Method) ~[pytorch-engine-0.26.0-SNAPSHOT.jar:na]
at ai.djl.pytorch.jni.IValueUtils.forward(IValueUtils.java:57) ~[pytorch-engine-0.26.0-SNAPSHOT.jar:na]
at ai.djl.pytorch.engine.PtSymbolBlock.forwardInternal(PtSymbolBlock.java:145) ~[pytorch-engine-0.26.0-SNAPSHOT.jar:na]
at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:79) ~[api-0.26.0-SNAPSHOT.jar:na]
at ai.djl.nn.Block.forward(Block.java:127) ~[api-0.26.0-SNAPSHOT.jar:na]
at ai.djl.inference.Predictor.predictInternal(Predictor.java:143) ~[api-0.26.0-SNAPSHOT.jar:na]
at ai.djl.inference.Predictor.batchPredict(Predictor.java:183) ~[api-0.26.0-SNAPSHOT.jar:na]
... 63 common frames omitted

After reinstall cuda, djl-0.26 can use gpu now. however, above exception thrown, what's the problem?

@SidneyLann
Copy link
Contributor Author

SidneyLann commented Dec 27, 2023

image

Why cu121 need cpu.so?

@frankfliu
Copy link
Contributor

frankfliu commented Dec 27, 2023

@SidneyLann
Copy link
Contributor Author

After .optOption("mapLocation", "true"); the device is gpu now, torchscript gpu infer 26 pics spent 320 seconds, but onnxruntime cpu only spent 85 seconds. It seams gpu not really used for torchscript .

@frankfliu
Copy link
Contributor

You can try onnxruntime on DJL if you want to (OnnxRuntime doesn't support cuda12 yet).

Did you compare the performance on PyTorch (GPU vs CPU)? The performance difference might related to image pre-processing.

@SidneyLann
Copy link
Contributor Author

SidneyLann commented Dec 28, 2023

PyTorch gpu spent 320 seconds, PyTorch cpu spent 86 seconds, this is because of issue #2899?

@SidneyLann
Copy link
Contributor Author

When will build for merging #2899 ?

@frankfliu
Copy link
Contributor

You can try our nightly snapshot release: https://docs.djl.ai/docs/get.html#nightly-snapshots

0.26.0 release will be available around end of Jan.

@SidneyLann
Copy link
Contributor Author

https://oss.sonatype.org/content/repositories/snapshots/

This repo has no the update for #2899

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants