Inference is long even on GPU machine

Hello thank you for the repo it is very complete, I was able to launch a model on an p2.xlarge EC2 instance on AWS.

I'm having performance problems. I have the impression that the gpu is not being used because I get inference times similar to the inference times when I run a model on my mac which has no GPU. 
The encoding image command 
`curl http://127.0.0.1:8080/predictions/sam_vit_h_encode -T slick_example.png` takes around 2 minutes to run as you can see on the followings logs:
```
2023-11-08T08:31:23,510 [INFO ] W-9000-sam_vit_h_encode_1.0.0-stdout MODEL_LOG - XXXXX  Inference time:  114.42793655395508
```
I was expecting "ms" performance.

Also when investigating logs, I see `pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:100.0|#Level:Host,DeviceId:0|#hostname:ac47803e69a1,timestamp:1699394016` so it looks like GPU is used.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference is long even on GPU machine #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference is long even on GPU machine #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions