Skip to content

TorchServe no prediction when input data gets bigger (Backend worker did not respond in given time) #1812

Open
@tednaseri

Description

@tednaseri

🐛 Describe the bug

I am passing JSON data to python-requests. For simplification you can assume the following input:

dic1 = {"main": "this is a main", "categories": "this is a categories"}
count = 2
input_data = [dic1 for i in range(count)]
response = requests.post(url, json=input_data)

Issue:
When the count<= 8 --> it is working well.
As soon as count>8 --> it stuck and never returns.

as you see the input is just a simple python dictionary and if I set input_data = [dic1 for i in range(10)] the final size of the input would be very small.

I am using:

  • custom handler
  • ML trained model is based on simpletransformer
  • Ubuntu 22.04 and 20.04
  • GPU (local: RTX 3060, Kubernetes: T4)
  • I have tested on local machines and Kubernetes. The issue is the same

When the issue shows itself:
TorchServe: on the GPU, it is critically dependent on the input data size.

When it works well:
TorchServe: On the CPU, it is working well regardless of the size of input.
PyTorch without TorchServe: I have tested it on PyTorch it is working well even when I pass input_data = [dic1 for i in range(1000)]

Error logs

2022-08-23 16:34:14,800 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-hardnews_1.0 State change null -> WORKER_STARTED
2022-08-23 16:34:14,804 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2022-08-23 16:34:22,821 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 7931
2022-08-23 16:34:22,821 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-hardnews_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2022-08-23 16:34:48,850 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1093
2022-08-23 16:34:48,852 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.job.Job - Waiting time ns: 239159, Backend time ns: 1094847821
2022-08-23 16:34:53,317 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 156
2022-08-23 16:34:53,318 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.job.Job - Waiting time ns: 144077, Backend time ns: 157878185
2022-08-23 16:35:01,126 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 224
2022-08-23 16:35:01,127 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.job.Job - Waiting time ns: 140180, Backend time ns: 225271057
2022-08-23 16:35:38,326 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 30000
2022-08-23 16:35:38,326 [ERROR] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1
2022-08-23 16:35:38,327 [ERROR] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:198)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
2022-08-23 16:35:38,328 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_MODEL_LOADED
2022-08-23 16:35:38,335 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.job.Job - Waiting time ns: 64717, Inference time ns: 30009467849
2022-08-23 16:35:38,335 [DEBUG] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-hardnews_1.0 State change WORKER_MODEL_LOADED -> WORKER_STOPPED
2022-08-23 16:35:38,335 [WARN ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-hardnews_1.0-stderr
2022-08-23 16:35:38,335 [WARN ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-hardnews_1.0-stdout
2022-08-23 16:35:38,336 [INFO ] W-9000-hardnews_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.

Installation instructions

I don't use docker for installation

Model Packaing

# Running archiver
torch-model-archiver -f --model-name model \
--version 1.0 \
--serialized-file model_folder/pytorch_model.bin \
--export-path model-store \
--requirements-file requirements.txt \
--extra-files "model_folder/config.json,model_folder/merges.txt,model_folder/model_args.json,model_folder/special_tokens_map.json,model_folder/tokenizer.json,model_folder/tokenizer_config.json,model_folder/training_args.bin,model_folder/vocab.json" \
--handler handler.py

config.properties

inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
install_py_dep_per_model=true
NUM_WORKERS=1
number_of_gpu=1
number_of_netty_threads=4
netty_client_threads=1
MKL_NUM_THREADS=1
batch_size=1
max_batch_delay=10
job_queue_size=1000
model_store=/home/model-server/shared/model-store
model_snapshot={"name": "startup.cfg","modelCount": 1,"models": {"news": {"1.0": {"defaultVersion": true,"marName": "news.mar","minWorkers": 1,"maxWorkers": 1,"batchSize": 1,"maxBatchDelay": 10,"responseTimeout": 120}}}}


- I have tried different values for threading, but not helpful

Versions

Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.4.0b20210521
torch-model-archiver==0.4.0b20210521

Python version: 3.8 (64-bit runtime)
Python executable: /home/ted/anaconda3/envs/myland/bin/python3

Versions of relevant python libraries:
captum==0.5.0
future==0.18.2
numpy==1.23.1
psutil==5.9.1
pytest==4.6.11
pytest-forked==1.4.0
pytest-timeout==1.4.2
pytest-xdist==1.34.0
requests==2.28.1
requests-mock==1.9.3
requests-oauthlib==1.3.1
sentencepiece==0.1.95
simpletransformers==0.62.0
torch==1.12.1
torch-model-archiver==0.4.0b20210521
torch-workflow-archiver==0.1.0b20210521
torchaudio==0.12.1
torchserve==0.4.0b20210521
torchvision==0.13.1
transformers==4.20.1
wheel==0.37.1
torch==1.12.1
**Warning: torchtext not present ..
torchvision==0.13.1
torchaudio==0.12.1

Java Version:


OS: Ubuntu 22.04 LTS
GCC version: (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Clang version: N/A
CMake version: N/A

Is CUDA available: Yes
CUDA runtime version: N/A
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 515.65.01
cuDNN version: None

Repro instructions

# Running archiver
torch-model-archiver -f --model-name model \
--version 1.0 \
--serialized-file model_folder/pytorch_model.bin \
--export-path model-store \
--requirements-file requirements.txt \
--extra-files "model_folder/config.json,model_folder/merges.txt,model_folder/model_args.json,model_folder/special_tokens_map.json,model_folder/tokenizer.json,model_folder/tokenizer_config.json,model_folder/training_args.bin,model_folder/vocab.json" \
--handler handler.py

torchserve --start --model-store model-store --models model=hardnews --ncs --ts-config config.properties

running prediction

Possible Solution

No response

Metadata

Metadata

Assignees

Labels

gputriaged_waitWaiting for the Reporter's resp

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions