
Description
Your current environment
I am encountering a persistent issue when attempting to serve a model from an S3 bucket using the vllm serve command with the --load-format runai_streamer option. Despite having proper access to the S3 bucket and all required files being present, the process fails with a "File access error." Below are the details of the issue:
Command Used:
vllm serve s3://hip-general/benchmark-model-loading/ --load-format runai_streamer
Error Message:
Exception: Could not send runai_request to libstreamer due to: b'File access error'
Environment Details:
VLLM version: 0.6.6
Python version: 3.12
RunAI Model Streamer version: 0.11.2
S3 Region: us-west-2
Files in S3 Bucket:
config.json
generation_config.json
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer.json
tokenizer_config.json
my deployment file is
apiVersion: apps/v1
kind: Deployment
metadata:
name: benchmark-model-8b
namespace: workload
spec:
replicas: 1
selector:
matchLabels:
app: benchmark-model-8b
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
app: benchmark-model-8b
spec:
containers:
- command:
- sh
- -c
- exec tail -f /dev/null
env:
- name: HF_HOME
value: /huggingface
- name: HUGGINGFACE_HUB_CACHE
value: /huggingface/hub
- name: HF_HUB_ENABLE_HF_TRANSFER
value: "False"
- name: HUGGING_FACE_HUB_TOKEN
value: ""
image: vllm/vllm-openai:v0.6.6
imagePullPolicy: IfNotPresent
name: benchmark-model-8b
ports:
- containerPort: 8888
name: http
protocol: TCP
resources:
limits:
nvidia.com/gpu: "1"
requests:
cpu: "5"
memory: 128Gi
securityContext:
capabilities:
add:
- SYS_ADMIN
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /huggingface
name: hf-volume
- mountPath: /dev/shm
name: dshm
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: hf-volume
persistentVolumeClaim:
claimName: benchmark-model-pvc
- emptyDir:
medium: Memory
sizeLimit: 90Gi
name: dshm
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.