Description
🐛 Describe the bug
I am passing JSON data to python-requests. For simplification you can assume the following input:
dic1 = {"main": "this is a main", "categories": "this is a categories"}
count = 2
input_data = [dic1 for i in range(count)]
response = requests.post(url, json=input_data)
Issue:
When the count<= 8
--> it is working well.
As soon as count>8
--> it stuck and never returns.
as you see the input is just a simple python dictionary and if I set input_data = [dic1 for i in range(10)]
the final size of the input would be very small.
I am using:
- custom handler
- ML trained model is based on simpletransformer
- Ubuntu 22.04 and 20.04
- GPU (local: RTX 3060, Kubernetes: T4)
- I have tested on local machines and Kubernetes. The issue is the same
When the issue shows itself:
TorchServe: on the GPU, it is critically dependent on the input data size.
When it works well:
TorchServe: On the CPU, it is working well regardless of the size of input.
PyTorch without TorchServe: I have tested it on PyTorch it is working well even when I pass input_data = [dic1 for i in range(1000)]
Error logs
2022-08-23 16:34:14,800 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-hardnews_1.0 State change null -> WORKER_STARTED
2022-08-23 16:34:14,804 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2022-08-23 16:34:22,821 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 7931
2022-08-23 16:34:22,821 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-hardnews_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2022-08-23 16:34:48,850 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1093
2022-08-23 16:34:48,852 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.job.Job - Waiting time ns: 239159, Backend time ns: 1094847821
2022-08-23 16:34:53,317 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 156
2022-08-23 16:34:53,318 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.job.Job - Waiting time ns: 144077, Backend time ns: 157878185
2022-08-23 16:35:01,126 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 224
2022-08-23 16:35:01,127 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.job.Job - Waiting time ns: 140180, Backend time ns: 225271057
2022-08-23 16:35:38,326 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 30000
2022-08-23 16:35:38,326 [ERROR] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1
2022-08-23 16:35:38,327 [ERROR] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:198)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
2022-08-23 16:35:38,328 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_MODEL_LOADED
2022-08-23 16:35:38,335 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.job.Job - Waiting time ns: 64717, Inference time ns: 30009467849
2022-08-23 16:35:38,335 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-hardnews_1.0 State change WORKER_MODEL_LOADED -> WORKER_STOPPED
2022-08-23 16:35:38,335 [WARN ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-hardnews_1.0-stderr
2022-08-23 16:35:38,335 [WARN ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-hardnews_1.0-stdout
2022-08-23 16:35:38,336 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
Installation instructions
I don't use docker for installation
Model Packaing
# Running archiver
torch-model-archiver -f --model-name model \
--version 1.0 \
--serialized-file model_folder/pytorch_model.bin \
--export-path model-store \
--requirements-file requirements.txt \
--extra-files "model_folder/config.json,model_folder/merges.txt,model_folder/model_args.json,model_folder/special_tokens_map.json,model_folder/tokenizer.json,model_folder/tokenizer_config.json,model_folder/training_args.bin,model_folder/vocab.json" \
--handler handler.py
config.properties
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
install_py_dep_per_model=true
NUM_WORKERS=1
number_of_gpu=1
number_of_netty_threads=4
netty_client_threads=1
MKL_NUM_THREADS=1
batch_size=1
max_batch_delay=10
job_queue_size=1000
model_store=/home/model-server/shared/model-store
model_snapshot={"name": "startup.cfg","modelCount": 1,"models": {"news": {"1.0": {"defaultVersion": true,"marName": "news.mar","minWorkers": 1,"maxWorkers": 1,"batchSize": 1,"maxBatchDelay": 10,"responseTimeout": 120}}}}
- I have tried different values for threading, but not helpful
Versions
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch:
torchserve==0.4.0b20210521
torch-model-archiver==0.4.0b20210521
Python version: 3.8 (64-bit runtime)
Python executable: /home/ted/anaconda3/envs/myland/bin/python3
Versions of relevant python libraries:
captum==0.5.0
future==0.18.2
numpy==1.23.1
psutil==5.9.1
pytest==4.6.11
pytest-forked==1.4.0
pytest-timeout==1.4.2
pytest-xdist==1.34.0
requests==2.28.1
requests-mock==1.9.3
requests-oauthlib==1.3.1
sentencepiece==0.1.95
simpletransformers==0.62.0
torch==1.12.1
torch-model-archiver==0.4.0b20210521
torch-workflow-archiver==0.1.0b20210521
torchaudio==0.12.1
torchserve==0.4.0b20210521
torchvision==0.13.1
transformers==4.20.1
wheel==0.37.1
torch==1.12.1
**Warning: torchtext not present ..
torchvision==0.13.1
torchaudio==0.12.1
Java Version:
OS: Ubuntu 22.04 LTS
GCC version: (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Clang version: N/A
CMake version: N/A
Is CUDA available: Yes
CUDA runtime version: N/A
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 515.65.01
cuDNN version: None
Repro instructions
# Running archiver
torch-model-archiver -f --model-name model \
--version 1.0 \
--serialized-file model_folder/pytorch_model.bin \
--export-path model-store \
--requirements-file requirements.txt \
--extra-files "model_folder/config.json,model_folder/merges.txt,model_folder/model_args.json,model_folder/special_tokens_map.json,model_folder/tokenizer.json,model_folder/tokenizer_config.json,model_folder/training_args.bin,model_folder/vocab.json" \
--handler handler.py
torchserve --start --model-store model-store --models model=hardnews --ncs --ts-config config.properties
running prediction
Possible Solution
No response