Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support TorchServe on cpu & gpu #15

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

agunapal
Copy link

@agunapal agunapal commented Jan 25, 2024

*What is the PR about

This PR is for integrating TorchServe with this solution

  • Supports CPU & GPU
  • Tested with ./test.sh run bmk

From UX POV, User needs to change model_server=torchserve in config.properties. Rest of the flow is the same.
Currently, this is supported for CPU only

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

CPU Logs

kubectl logs bert-base-workshop-0-7kmgk -n mpi
/app/tests /home/model-server
Configuring number of model servers from config.properties ...
Number of model servers (1) configured from environment ...
Namespace(url='http://bert-base-multilingual-cased-cpu-[INSTANCE_IDX].mpi.svc.cluster.local:8080/predictions/model[MODEL_IDX]', num_thread=2, latency_window_size=1000, throughput_time=180, throughput_interval=10, is_multi_instance=True, n_instance=1, is_multi_model_per_instance=True, n_model_per_instance=1, post=True, verbose=True, cache_dns=True, model_server='torchserve')
caching dns
http://bert-base-multilingual-cased-cpu-0.mpi.svc.cluster.local:8080/predictions/model0
http://10.100.115.203:8080/predictions/model0
<Response [200]>
{'pid': 6, 'throughput': 0.0, 'p50': '0.000', 'p90': '0.000', 'p95': '0.000', 'errors': '0'}
{}
{}

{'pid': 6, 'throughput': 5.7, 'p50': '0.358', 'p90': '0.402', 'p95': '0.414', 'errors': '0'}
{'p90_0_0': '0.402'}
{'num_0_0': 57}

{'pid': 6, 'throughput': 5.9, 'p50': '0.318', 'p90': '0.392', 'p95': '0.412', 'errors': '0'}
{'p90_0_0': '0.392'}
{'num_0_0': 116}

{'pid': 6, 'throughput': 6.4, 'p50': '0.311', 'p90': '0.387', 'p95': '0.409', 'errors': '0'}
{'p90_0_0': '0.387'}
{'num_0_0': 180}

GPU logs

kubectl logs bert-base-workshop-0-7l2rg -n mpi                        
/app/tests /home/model-server
Configuring number of model servers from config.properties ...
Number of model servers (1) configured from environment ...
Namespace(url='http://bert-base-multilingual-cased-gpu-[INSTANCE_IDX].mpi.svc.cluster.local:8080/predictions/model[MODEL_IDX]', num_thread=2, latency_window_size=1000, throughput_time=180, throughput_interval=10, is_multi_instance=True, n_instance=1, is_multi_model_per_instance=True, n_model_per_instance=1, post=True, verbose=True, cache_dns=True, model_server='torchserve')
caching dns
http://bert-base-multilingual-cased-gpu-0.mpi.svc.cluster.local:8080/predictions/model0
http://10.100.120.85:8080/predictions/model0
<Response [200]>
{'pid': 6, 'throughput': 0.0, 'p50': '0.000', 'p90': '0.000', 'p95': '0.000', 'errors': '0'}
{}
{}

{'pid': 6, 'throughput': 92.5, 'p50': '0.021', 'p90': '0.025', 'p95': '0.027', 'errors': '0'}
{'p90_0_0': '0.025'}
{'num_0_0': 925}

{'pid': 6, 'throughput': 98.3, 'p50': '0.020', 'p90': '0.022', 'p95': '0.025', 'errors': '0'}
{'p90_0_0': '0.024'}
{'num_0_0': 1908}

{'pid': 6, 'throughput': 100.1, 'p50': '0.020', 'p90': '0.021', 'p95': '0.021', 'errors': '0'}
{'p90_0_0': '0.023'}
{'num_0_0': 2909}

@dzilbermanvmw
Copy link
Collaborator

So far we have successfully verified the PR is for CPU only, both fastapi and torchserve settings for the new parameter in the config.properties:

# model_server = fastapi|torchserve
model_server=torchserve

We are planning to release an update for the related AWS guidance shortly containing other important changes without this PR included yet, then will focus on merging this PR upon additional testing on other architectures (AWS Graviton, GPU etc). Thanks

@agunapal agunapal changed the title Changes to support TorchServe on cpu Changes to support TorchServe on cpu & gpu Apr 9, 2024
@dzilbermanvmw
Copy link
Collaborator

dzilbermanvmw commented Apr 17, 2024

Update 4/17/24: tested this PR using images built for "torchserve" API server on AWS Graviton and Inferentia 2 based nodes. In both cases there were run-time container errors like:
containers: main: Container ID: containerd://42bfe08ada826553ffe57ba56dd93627d71ad75cbc0ee3c19d6e0ad6b953cbc7 Image: public.ecr.aws/a2u7h5w3/bert-base-workshop:v11-torchserve-inf2 Image ID: public.ecr.aws/a2u7h5w3/bert-base-workshop@sha256:5841e70fa95efe1d62f38a51854c187b4af751c59b18fda59d59a2df8a2103e3 Port: 8080/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: StartError Message: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/usr/local/bin/dockerd-entrypoint.sh": stat /usr/local/bin/dockerd-entrypoint.sh: no such file or directory: unknown Exit Code: 128
------
If the PR merge criteria is to have torchserve based API server work on X86_64 (CPU), Graviton and Inf2 architectures then the above issue must be resolved

@dzilbermanvmw
Copy link
Collaborator

dzilbermanvmw commented Apr 18, 2024

Also, the /3-pack/Dockerfile.torchserve file needs to have this command before the:

...
LABEL description="Model $MODEL_NAME packed in a TorchServe container to run on $PROCESSOR"
#add this command below
RUN mkdir -p /home/model-server/model-store   
RUN wget https://torchserve.pytorch.org/mar_files/bert_seqc_without_torchscript.mar -O /home/model-server/model-store/BERTSC.mar

in order for ./pack.sh command to work.

Copy link
Collaborator

@dzilbermanvmw dzilbermanvmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Dockerfile.torchserve fails to build ML inference images on inf2 and graviton based EC2 instances. However, appears to work fine w/o changes - build corresponding images that can run on matching processor architecture EKS nodes - on X86_64 and GPU based instances.
In order for the ./pack.sh command to work, user model-server needs to be explicitly created:

ARG BASE_IMAGE
FROM $BASE_IMAGE

ARG MODEL_NAME
ARG MODEL_FILE_NAME
ARG PROCESSOR
LABEL description="Model $MODEL_NAME packed in a TorchServe container to run on $PROCESSOR"

#DZ: added line to create a user that is later used as an owner of /home/model-server folder
RUN useradd -m model-server
WORKDIR /home/model-server
COPY 3-pack/torchserve torchserve
WORKDIR /home/model-server/torchserve
USER root
COPY 3-pack/torchserve/dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \
    && chown -R model-server /home/model-server  

So the last command that was failing before chown -R model-server /home/model-server now works fine, images are built and available from public ECR here: public.ecr.aws/a2u7h5w3/bert-base-workshop:v12-inf2-torchserve
However, their execution fails at runtime with nondeterministic error:
Normal Pulling 12m (x5 over 13m) kubelet Pulling image "public.ecr.aws/a2u7h5w3/bert-base-workshop:v12-inf2-torchserve" Normal Pulled 12m kubelet Successfully pulled image "public.ecr.aws/a2u7h5w3/bert-base-workshop:v12-inf2-torchserve" in 157ms (157ms including waiting) Warning BackOff 3m33s (x48 over 13m) kubelet Back-off restarting failed container main in pod bert-base-multilingual-cased-inf2-0-6d66b9c798-2v7zz_mpi(2bebae59-24ee-4a1f-a98d-c1f0facf5604)

@agunapal
Copy link
Author

@dzilbermanvmw Thanks for checking. I havent tested them on both inf2 and graviton . Will look into these next week

@dzilbermanvmw
Copy link
Collaborator

@dzilbermanvmw Thanks for checking. I havent tested them on both inf2 and graviton . Will look into these next week
No problem @agunapal - that's what we're here for.
FYI on CPU instances the pack.sh command also works fine w/o modifying the 3-pack\Dockerfile.torchserve and generates a deployable image.
So CPU and GPU based instances are OK so far, inf2 and Graviton are not yet..

@sridevi1209
Copy link
Contributor

sridevi1209 commented Apr 24, 2024

Adding further details - for CPU, the built and deployment was successful. For GPU, built was successful however for deployment to be successful, we have commented out the below in limits section in 4-deploy/app-bert-base-multilingual-cased-gpu-g4dn.xlarge/bert-base-multilingual-cased-gpu-0.yaml file.

resources:
limits:
#nvidia.com/gpu:1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants