Skip to content

java.io.IOException: Cannot run program "/home/venv/bin/python" when running pytorch/torchserve image #2504

Open
@lz-chen

Description

@lz-chen

🐛 Describe the bug

I'm following this tutorial but the docker run didn't start properly

Error logs

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2023-07-28T12:40:14,354 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-07-28T12:40:14,658 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2023-07-28T12:40:15,063 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.8.1
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 982 M
Python executable: /home/venv/bin/python
Config file: /home/model-server/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 32
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.|http(s)?://.]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: log
Disable system metrics: false
Workflow Store: /home/model-server/model-store
Model config: N/A
2023-07-28T12:40:15,097 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2023-07-28T12:40:15,181 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-07-28T12:40:15,471 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2023-07-28T12:40:15,472 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-07-28T12:40:15,477 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2023-07-28T12:40:15,480 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-07-28T12:40:15,482 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
2023-07-28T12:40:16,282 [ERROR] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector -
java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/venv/lib/python3.9/site-packages"): error=0, Failed to exec spawn helper: pid: 47, exit value: 1
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.metrics.MetricCollector.run(MetricCollector.java:44) [model-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 47, exit value: 1
at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?]
at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?]
at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?]
... 9 more
2023-07-28T12:40:33,226 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mnist
2023-07-28T12:40:33,229 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2023-07-28T12:40:33,230 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model mnist loaded.
2023-07-28T12:40:33,239 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - updateModel: mnist, count: 4
2023-07-28T12:40:33,262 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-07-28T12:40:33,272 [DEBUG] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
2023-07-28T12:40:33,285 [DEBUG] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9003, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
2023-07-28T12:40:33,300 [DEBUG] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9002, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-07-28T12:40:33,296 [ERROR] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 59, exit value: 1
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 7 more
Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 59, exit value: 1
at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?]
at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?]
at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 7 more
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
2023-07-28T12:40:33,314 [ERROR] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 62, exit value: 1
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 7 more
Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 62, exit value: 1
at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?]
at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?]
at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 7 more
2023-07-28T12:40:33,318 [DEBUG] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9003-mnist_1.0 State change null -> WORKER_STOPPED
2023-07-28T12:40:33,319 [INFO ] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1690548033319
2023-07-28T12:40:33,322 [DEBUG] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-mnist_1.0 State change null -> WORKER_STOPPED
2023-07-28T12:40:33,286 [ERROR] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 54, exit value: 1
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 7 more
Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 54, exit value: 1
at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?]
at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?]
at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 7 more
2023-07-28T12:40:33,324 [INFO ] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9003 in 1 seconds.
2023-07-28T12:40:33,325 [INFO ] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1690548033325
2023-07-28T12:40:33,326 [INFO ] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 1 seconds.
2023-07-28T12:40:33,328 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change null -> WORKER_STOPPED
2023-07-28T12:40:33,334 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1690548033334
2023-07-28T12:40:33,335 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
2023-07-28T12:40:33,338 [ERROR] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 65, exit value: 1
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 7 more
Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 65, exit value: 1
at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?]
at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?]
at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 7 more
2023-07-28T12:40:33,344 [DEBUG] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9002-mnist_1.0 State change null -> WORKER_STOPPED
2023-07-28T12:40:33,345 [INFO ] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1690548033344
2023-07-28T12:40:33,345 [INFO ] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9002 in 1 seconds.
2023-07-28T12:40:33,345 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Removed model: mnist version: 1.0
2023-07-28T12:40:33,347 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.WorkerThread - W-9003-mnist_1.0 State change WORKER_STOPPED -> WORKER_SCALED_DOWN
2023-07-28T12:40:33,348 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.WorkerThread - W-9002-mnist_1.0 State change WORKER_STOPPED -> WORKER_SCALED_DOWN
2023-07-28T12:40:33,349 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.WorkerThread - W-9001-mnist_1.0 State change WORKER_STOPPED -> WORKER_SCALED_DOWN
2023-07-28T12:40:33,349 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STOPPED -> WORKER_SCALED_DOWN
2023-07-28T12:40:33,356 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model mnist unregistered.
2023-07-28T12:40:33,371 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /172.17.0.1:38924 "POST /models?model_name=mnist&url=mnist.mar&initial_workers=4 HTTP/1.1" 500 763
2023-07-28T12:40:33,374 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests5XX.Count:1.0|#Level:Host|#hostname:b68f79c2508d,timestamp:1690548033
2023-07-28T12:40:34,367 [DEBUG] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-07-28T12:40:34,366 [DEBUG] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9002, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-07-28T12:40:34,366 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-07-28T12:40:34,366 [DEBUG] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9003, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
2023-07-28T12:40:34,422 [ERROR] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 75, exit value: 1
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 5 more
Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 75, exit value: 1
at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?]
at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?]
at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 5 more
2023-07-28T12:40:34,422 [ERROR] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 72, exit value: 1
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 5 more
Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 72, exit value: 1
at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?]
at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?]
at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 5 more
2023-07-28T12:40:34,422 [ERROR] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?]
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 70, exit value: 1
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 5 more
Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 70, exit value: 1
at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?]
at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?]
at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:594) ~[?:?]
at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?]
... 5 more

Installation instructions

I have torchserve 0.8.1 installed but this error occurs when I run with the latest docker image pytorch/torchserve:0.8.1-cpu

Model Packaing

torch-model-archiver --model-name mnist --version 1.0 --model-file examples/image_classifier/mnist/mnist.py --serialized-file examples/image_classifier/mnist/mnist_cnn.pt --handler  examples/image_classifier/mnist/mnist_handler.py

config.properties

I didn't pass customized config.properties

Versions


Environment headers

Torchserve branch:

torchserve==0.8.1
torch-model-archiver==0.6.0

Python version: 3.8 (64-bit runtime)
Python executable: /Users/lzchen/miniforge3/envs/tjc-main/bin/python

Versions of relevant python libraries:
captum==0.6.0
numpy==1.20.3
psutil==5.9.4
requests==2.28.1
sentence-transformers==2.2.2
sentencepiece==0.1.95
torch==1.10.2
torch-model-archiver==0.6.0
torchserve==0.8.1
torchvision==0.9.0a0
transformers==4.24.0
wheel==0.37.1
torch==1.10.2
**Warning: torchtext not present ..
torchvision==0.9.0a0
**Warning: torchaudio not present ..

Java Version:

OS: Mac OSX 13.4.1 (arm64)
GCC version: N/A
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: N/A

Versions of npm installed packages:
**Warning: newman, newman-reporter-html markdown-link-check not installed...

Repro instructions

torch-model-archiver --model-name mnist --version 1.0 --model-file examples/image_classifier/mnist/mnist.py --serialized-file examples/image_classifier/mnist/mnist_cnn.pt --handler  examples/image_classifier/mnist/mnist_handler.py
mkdir model_store
mv mnist.mar model_store/
docker run --rm -it -p 8080:8080 -p 8081:8081 -p 8082:8082 -v $(pwd)/model_store:/home/model-server/model-store pytorch/torchserve:latest-cpu

Possible Solution

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationm1triagedIssue has been reviewed and triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions