Changes to support TorchServe on cpu & gpu #15

agunapal · 2024-01-25T22:11:21Z

*What is the PR about

This PR is for integrating TorchServe with this solution

Supports CPU & GPU
Tested with ./test.sh run bmk

From UX POV, User needs to change model_server=torchserve in config.properties. Rest of the flow is the same.
Currently, this is supported for CPU only

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

CPU Logs

kubectl logs bert-base-workshop-0-7kmgk -n mpi
/app/tests /home/model-server
Configuring number of model servers from config.properties ...
Number of model servers (1) configured from environment ...
Namespace(url='http://bert-base-multilingual-cased-cpu-[INSTANCE_IDX].mpi.svc.cluster.local:8080/predictions/model[MODEL_IDX]', num_thread=2, latency_window_size=1000, throughput_time=180, throughput_interval=10, is_multi_instance=True, n_instance=1, is_multi_model_per_instance=True, n_model_per_instance=1, post=True, verbose=True, cache_dns=True, model_server='torchserve')
caching dns
http://bert-base-multilingual-cased-cpu-0.mpi.svc.cluster.local:8080/predictions/model0
http://10.100.115.203:8080/predictions/model0
<Response [200]>
{'pid': 6, 'throughput': 0.0, 'p50': '0.000', 'p90': '0.000', 'p95': '0.000', 'errors': '0'}
{}
{}

{'pid': 6, 'throughput': 5.7, 'p50': '0.358', 'p90': '0.402', 'p95': '0.414', 'errors': '0'}
{'p90_0_0': '0.402'}
{'num_0_0': 57}

{'pid': 6, 'throughput': 5.9, 'p50': '0.318', 'p90': '0.392', 'p95': '0.412', 'errors': '0'}
{'p90_0_0': '0.392'}
{'num_0_0': 116}

{'pid': 6, 'throughput': 6.4, 'p50': '0.311', 'p90': '0.387', 'p95': '0.409', 'errors': '0'}
{'p90_0_0': '0.387'}
{'num_0_0': 180}

GPU logs

kubectl logs bert-base-workshop-0-7l2rg -n mpi                        
/app/tests /home/model-server
Configuring number of model servers from config.properties ...
Number of model servers (1) configured from environment ...
Namespace(url='http://bert-base-multilingual-cased-gpu-[INSTANCE_IDX].mpi.svc.cluster.local:8080/predictions/model[MODEL_IDX]', num_thread=2, latency_window_size=1000, throughput_time=180, throughput_interval=10, is_multi_instance=True, n_instance=1, is_multi_model_per_instance=True, n_model_per_instance=1, post=True, verbose=True, cache_dns=True, model_server='torchserve')
caching dns
http://bert-base-multilingual-cased-gpu-0.mpi.svc.cluster.local:8080/predictions/model0
http://10.100.120.85:8080/predictions/model0
<Response [200]>
{'pid': 6, 'throughput': 0.0, 'p50': '0.000', 'p90': '0.000', 'p95': '0.000', 'errors': '0'}
{}
{}

{'pid': 6, 'throughput': 92.5, 'p50': '0.021', 'p90': '0.025', 'p95': '0.027', 'errors': '0'}
{'p90_0_0': '0.025'}
{'num_0_0': 925}

{'pid': 6, 'throughput': 98.3, 'p50': '0.020', 'p90': '0.022', 'p95': '0.025', 'errors': '0'}
{'p90_0_0': '0.024'}
{'num_0_0': 1908}

{'pid': 6, 'throughput': 100.1, 'p50': '0.020', 'p90': '0.021', 'p95': '0.021', 'errors': '0'}
{'p90_0_0': '0.023'}
{'num_0_0': 2909}

dzilbermanvmw · 2024-02-12T22:21:27Z

So far we have successfully verified the PR is for CPU only, both fastapi and torchserve settings for the new parameter in the config.properties:

# model_server = fastapi|torchserve
model_server=torchserve

We are planning to release an update for the related AWS guidance shortly containing other important changes without this PR included yet, then will focus on merging this PR upon additional testing on other architectures (AWS Graviton, GPU etc). Thanks

dzilbermanvmw · 2024-04-17T21:12:41Z

Update 4/17/24: tested this PR using images built for "torchserve" API server on AWS Graviton and Inferentia 2 based nodes. In both cases there were run-time container errors like:
containers: main: Container ID: containerd://42bfe08ada826553ffe57ba56dd93627d71ad75cbc0ee3c19d6e0ad6b953cbc7 Image: public.ecr.aws/a2u7h5w3/bert-base-workshop:v11-torchserve-inf2 Image ID: public.ecr.aws/a2u7h5w3/bert-base-workshop@sha256:5841e70fa95efe1d62f38a51854c187b4af751c59b18fda59d59a2df8a2103e3 Port: 8080/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: StartError Message: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/usr/local/bin/dockerd-entrypoint.sh": stat /usr/local/bin/dockerd-entrypoint.sh: no such file or directory: unknown Exit Code: 128
------
If the PR merge criteria is to have torchserve based API server work on X86_64 (CPU), Graviton and Inf2 architectures then the above issue must be resolved

dzilbermanvmw · 2024-04-18T22:06:09Z

Also, the /3-pack/Dockerfile.torchserve file needs to have this command before the:

...
LABEL description="Model $MODEL_NAME packed in a TorchServe container to run on $PROCESSOR"
#add this command below
RUN mkdir -p /home/model-server/model-store   
RUN wget https://torchserve.pytorch.org/mar_files/bert_seqc_without_torchscript.mar -O /home/model-server/model-store/BERTSC.mar

in order for ./pack.sh command to work.

dzilbermanvmw

This Dockerfile.torchserve fails to build ML inference images on inf2 and graviton based EC2 instances. However, appears to work fine w/o changes - build corresponding images that can run on matching processor architecture EKS nodes - on X86_64 and GPU based instances.
In order for the ./pack.sh command to work, user model-server needs to be explicitly created:

ARG BASE_IMAGE
FROM $BASE_IMAGE

ARG MODEL_NAME
ARG MODEL_FILE_NAME
ARG PROCESSOR
LABEL description="Model $MODEL_NAME packed in a TorchServe container to run on $PROCESSOR"

#DZ: added line to create a user that is later used as an owner of /home/model-server folder
RUN useradd -m model-server
WORKDIR /home/model-server
COPY 3-pack/torchserve torchserve
WORKDIR /home/model-server/torchserve
USER root
COPY 3-pack/torchserve/dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \
    && chown -R model-server /home/model-server

So the last command that was failing before chown -R model-server /home/model-server now works fine, images are built and available from public ECR here: public.ecr.aws/a2u7h5w3/bert-base-workshop:v12-inf2-torchserve
However, their execution fails at runtime with nondeterministic error:
Normal Pulling 12m (x5 over 13m) kubelet Pulling image "public.ecr.aws/a2u7h5w3/bert-base-workshop:v12-inf2-torchserve" Normal Pulled 12m kubelet Successfully pulled image "public.ecr.aws/a2u7h5w3/bert-base-workshop:v12-inf2-torchserve" in 157ms (157ms including waiting) Warning BackOff 3m33s (x48 over 13m) kubelet Back-off restarting failed container main in pod bert-base-multilingual-cased-inf2-0-6d66b9c798-2v7zz_mpi(2bebae59-24ee-4a1f-a98d-c1f0facf5604)

agunapal · 2024-04-24T01:00:52Z

@dzilbermanvmw Thanks for checking. I havent tested them on both inf2 and graviton . Will look into these next week

dzilbermanvmw · 2024-04-24T04:14:12Z

@dzilbermanvmw Thanks for checking. I havent tested them on both inf2 and graviton . Will look into these next week
No problem @agunapal - that's what we're here for.
FYI on CPU instances the pack.sh command also works fine w/o modifying the 3-pack\Dockerfile.torchserve and generates a deployable image.
So CPU and GPU based instances are OK so far, inf2 and Graviton are not yet..

sridevi1209 · 2024-04-24T23:02:32Z

Adding further details - for CPU, the built and deployment was successful. For GPU, built was successful however for deployment to be successful, we have commented out the below in limits section in 4-deploy/app-bert-base-multilingual-cased-gpu-g4dn.xlarge/bert-base-multilingual-cased-gpu-0.yaml file.

resources:
limits:
#nvidia.com/gpu:1

agunapal added 3 commits January 25, 2024 14:08

Changes to support TorchServe on cpu

e8e62b4

Changes to support TorchServe on cpu

b55acc1

changed from framework to model_server

22ab345

Merge branch 'main' into torchserve_integration

c7571c0

agunapal changed the title ~~Changes to support TorchServe on cpu~~ Changes to support TorchServe on cpu & gpu Apr 9, 2024

agunapal added 2 commits April 8, 2024 18:07

Changes to support GPU added

db4197d

Changes to build MAR file dynamically

2379e4f

dzilbermanvmw reviewed Apr 24, 2024

View reviewed changes

agunapal mentioned this pull request Jun 27, 2024

Docker swarm with TorchServe workflow pytorch/serve#3206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to support TorchServe on cpu & gpu #15

Changes to support TorchServe on cpu & gpu #15

agunapal commented Jan 25, 2024 •

edited

Loading

dzilbermanvmw commented Feb 12, 2024

dzilbermanvmw commented Apr 17, 2024 •

edited

Loading

dzilbermanvmw commented Apr 18, 2024 •

edited

Loading

dzilbermanvmw left a comment

agunapal commented Apr 24, 2024

dzilbermanvmw commented Apr 24, 2024

sridevi1209 commented Apr 24, 2024 •

edited

Loading

Changes to support TorchServe on cpu & gpu #15

Are you sure you want to change the base?

Changes to support TorchServe on cpu & gpu #15

Conversation

agunapal commented Jan 25, 2024 • edited Loading

dzilbermanvmw commented Feb 12, 2024

dzilbermanvmw commented Apr 17, 2024 • edited Loading

dzilbermanvmw commented Apr 18, 2024 • edited Loading

dzilbermanvmw left a comment

Choose a reason for hiding this comment

agunapal commented Apr 24, 2024

dzilbermanvmw commented Apr 24, 2024

sridevi1209 commented Apr 24, 2024 • edited Loading

agunapal commented Jan 25, 2024 •

edited

Loading

dzilbermanvmw commented Apr 17, 2024 •

edited

Loading

dzilbermanvmw commented Apr 18, 2024 •

edited

Loading

sridevi1209 commented Apr 24, 2024 •

edited

Loading