We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello!
I use ClearML free (the one without configuration vault stuff) + clearml-serving module
When I spinned docker-compose and tried to pull model from our s3, I've got an error in tritonserver container:
2024-03-13 11:26:56,913 - clearml.storage - WARNING - Failed getting object size: ClientError('An error occurred (403) when calling the HeadObject operation: Forbidden') 2024-03-13 14:26:57 2024-03-13 11:26:57,042 - clearml.storage - ERROR - Could not download s3://<BUCKET>/<FOLDER>/<PROJECT>/<TASK_NAME>.75654091e56141199c9d9594305d6872/models/model_package.zip , err: An error occurred (403) when calling the HeadObject operation: Forbidden
But I've set env variables in example.env (AWS_ ones too) and I could find them in tritonserver container via
$ env | grep CLEARML $ env | grep AWS
version: "3" services: zookeeper: image: bitnami/zookeeper:3.7.0 container_name: clearml-serving-zookeeper # ports: # - "2181:2181" environment: - ALLOW_ANONYMOUS_LOGIN=yes networks: - clearml-serving-backend kafka: image: bitnami/kafka:3.1.1 container_name: clearml-serving-kafka # ports: # - "9092:9092" environment: - KAFKA_BROKER_ID=1 - KAFKA_CFG_LISTENERS=PLAINTEXT://clearml-serving-kafka:9092 - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://clearml-serving-kafka:9092 - KAFKA_CFG_ZOOKEEPER_CONNECT=clearml-serving-zookeeper:2181 - ALLOW_PLAINTEXT_LISTENER=yes - KAFKA_CREATE_TOPICS="topic_test:1:1" depends_on: - zookeeper networks: - clearml-serving-backend prometheus: image: prom/prometheus:v2.34.0 container_name: clearml-serving-prometheus volumes: - ./prometheus.yml:/prometheus.yml command: - '--config.file=/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--web.console.libraries=/etc/prometheus/console_libraries' - '--web.console.templates=/etc/prometheus/consoles' - '--storage.tsdb.retention.time=200h' - '--web.enable-lifecycle' restart: unless-stopped # ports: # - "9090:9090" depends_on: - clearml-serving-statistics networks: - clearml-serving-backend alertmanager: image: prom/alertmanager:v0.23.0 container_name: clearml-serving-alertmanager restart: unless-stopped # ports: # - "9093:9093" depends_on: - prometheus - grafana networks: - clearml-serving-backend grafana: image: grafana/grafana:8.4.4-ubuntu container_name: clearml-serving-grafana volumes: - './datasource.yml:/etc/grafana/provisioning/datasources/datasource.yaml' restart: unless-stopped ports: - "3001:3000" depends_on: - prometheus networks: - clearml-serving-backend clearml-serving-inference: image: allegroai/clearml-serving-inference:1.3.1-vllm build: context: ../ dockerfile: clearml_serving/serving/Dockerfile container_name: clearml-serving-inference restart: unless-stopped # optimize perforamnce security_opt: - seccomp:unconfined ports: - "8080:8080" environment: CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml} CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml} CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml} CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY} CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY} CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-} CLEARML_SERVING_PORT: ${CLEARML_SERVING_PORT:-8080} CLEARML_SERVING_POLL_FREQ: ${CLEARML_SERVING_POLL_FREQ:-1.0} CLEARML_DEFAULT_BASE_SERVE_URL: ${CLEARML_DEFAULT_BASE_SERVE_URL:-http://127.0.0.1:8080/serve} CLEARML_DEFAULT_KAFKA_SERVE_URL: ${CLEARML_DEFAULT_KAFKA_SERVE_URL:-clearml-serving-kafka:9092} CLEARML_DEFAULT_TRITON_GRPC_ADDR: ${CLEARML_DEFAULT_TRITON_GRPC_ADDR:-clearml-serving-triton:8001} CLEARML_USE_GUNICORN: ${CLEARML_USE_GUNICORN:-} CLEARML_SERVING_NUM_PROCESS: ${CLEARML_SERVING_NUM_PROCESS:-} CLEARML_EXTRA_PYTHON_PACKAGES: ${CLEARML_EXTRA_PYTHON_PACKAGES:-} AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-} AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-} AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-} GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-} AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-} AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-} depends_on: - kafka - clearml-serving-triton networks: - clearml-serving-backend clearml-serving-triton: image: allegroai/clearml-serving-triton:1.3.1-vllm build: context: ../ dockerfile: clearml_serving/engines/triton/Dockerfile.vllm container_name: clearml-serving-triton restart: unless-stopped # optimize perforamnce security_opt: - seccomp:unconfined # ports: # - "8001:8001" environment: CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml} CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml} CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml} CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY} CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY} CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-} CLEARML_TRITON_POLL_FREQ: ${CLEARML_TRITON_POLL_FREQ:-1.0} CLEARML_TRITON_METRIC_FREQ: ${CLEARML_TRITON_METRIC_FREQ:-1.0} CLEARML_EXTRA_PYTHON_PACKAGES: ${CLEARML_EXTRA_PYTHON_PACKAGES:-} AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-} AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-} AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-} GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-} AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-} AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-} depends_on: - kafka networks: - clearml-serving-backend deploy: resources: reservations: devices: - driver: nvidia device_ids: ['1'] capabilities: [gpu] clearml-serving-statistics: image: allegroai/clearml-serving-statistics:latest container_name: clearml-serving-statistics restart: unless-stopped # optimize perforamnce security_opt: - seccomp:unconfined # ports: # - "9999:9999" environment: CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml} CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml} CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml} CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY} CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY} CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-} CLEARML_DEFAULT_KAFKA_SERVE_URL: ${CLEARML_DEFAULT_KAFKA_SERVE_URL:-clearml-serving-kafka:9092} CLEARML_SERVING_POLL_FREQ: ${CLEARML_SERVING_POLL_FREQ:-1.0} depends_on: - kafka networks: - clearml-serving-backend networks: clearml-serving-backend: driver: bridge
CLEARML_WEB_HOST="[REDACTED]" CLEARML_API_HOST="[REDACTED]" CLEARML_FILES_HOST="s3://[REDACTED]" CLEARML_API_ACCESS_KEY="<access_key_here>" CLEARML_API_SECRET_KEY="<secret_key_here>" CLEARML_SERVING_TASK_ID="<serving_service_id_here>" CLEARML_EXTRA_PYTHON_PACKAGES="boto3" AWS_ACCESS_KEY_ID="[REDACTED]" AWS_SECRET_ACCESS_KEY="[REDACTED]" AWS_DEFAULT_REGION="[REDACTED]"
FROM nvcr.io/nvidia/tritonserver:24.02-vllm-python-py3 ENV LC_ALL=C.UTF-8 COPY clearml_serving /root/clearml/clearml_serving COPY requirements.txt /root/clearml/requirements.txt COPY README.md /root/clearml/README.md COPY setup.py /root/clearml/setup.py RUN python3 -m pip install --no-cache-dir -r /root/clearml/clearml_serving/engines/triton/requirements.txt RUN python3 -m pip install --no-cache-dir -U pip -e /root/clearml/ # default serving port EXPOSE 8001 # environement variable to load Task from CLEARML_SERVING_TASK_ID, CLEARML_SERVING_PORT WORKDIR /root/clearml/ ENTRYPOINT ["clearml_serving/engines/triton/entrypoint.sh"]
The text was updated successfully, but these errors were encountered:
I think this is because https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/engines/triton/triton_helper.py#L140 - it can't download model from s3, because clearml.storage.helper.StorageHelper can't configure _Boto3Driver using only env variables
I added clearml.conf file with aws.s3 creds to the root of git repository and fixed my Dockerfile.vllm:
FROM nvcr.io/nvidia/tritonserver:24.02-vllm-python-py3 ENV LC_ALL=C.UTF-8 COPY clearml_serving /root/clearml/clearml_serving COPY requirements.txt /root/clearml/requirements.txt COPY clearml.conf /root/clearml.conf COPY README.md /root/clearml/README.md COPY setup.py /root/clearml/setup.py RUN python3 -m pip install --no-cache-dir -r /root/clearml/clearml_serving/engines/triton/requirements.txt RUN python3 -m pip install --no-cache-dir -U pip -e /root/clearml/ # default serving port EXPOSE 8001 # environement variable to load Task from CLEARML_SERVING_TASK_ID, CLEARML_SERVING_PORT WORKDIR /root/clearml/ ENTRYPOINT ["clearml_serving/engines/triton/entrypoint.sh"]
and then I fixed entrypoint.sh:
#!/bin/bash # print configuration echo CLEARML_SERVING_TASK_ID="$CLEARML_SERVING_TASK_ID" echo CLEARML_TRITON_POLL_FREQ="$CLEARML_TRITON_POLL_FREQ" echo CLEARML_TRITON_METRIC_FREQ="$CLEARML_TRITON_METRIC_FREQ" echo CLEARML_TRITON_HELPER_ARGS="$CLEARML_TRITON_HELPER_ARGS" echo CLEARML_EXTRA_PYTHON_PACKAGES="$CLEARML_EXTRA_PYTHON_PACKAGES" # we should also have clearml-server configurations if [ ! -z "$CLEARML_EXTRA_PYTHON_PACKAGES" ] then python3 -m pip install $CLEARML_EXTRA_PYTHON_PACKAGES fi # start service clearml-init --file /root/clearml.conf && PYTHONPATH=$(pwd) python3 clearml_serving/engines/triton/triton_helper.py $CLEARML_TRITON_HELPER_ARGS $@
Actually I don't know why I faced this issue, I think I did something wrong. In enterprise version we didn't face one because of configuration vault.
Sorry, something went wrong.
No branches or pull requests
Hello!
I use ClearML free (the one without configuration vault stuff) + clearml-serving module
When I spinned docker-compose and tried to pull model from our s3, I've got an error in tritonserver container:
But I've set env variables in example.env (AWS_ ones too) and I could find them in tritonserver container via
FILES
docker-compose-triton-gpu.yaml
example.env
Dockerfile.vllm:
The text was updated successfully, but these errors were encountered: