Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give TensorRT-LLMa proper CI/CD 😍 #2886

Merged
merged 134 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
f729f2c
test(ctest) enable address sanitizer
mfuntowicz Nov 18, 2024
0baa017
feat(trtllm): expose finish reason to Rust
mfuntowicz Dec 10, 2024
cb8fdde
feat(trtllm): fix logits retrieval
mfuntowicz Dec 10, 2024
0ab1dd8
misc(ci): enabe building tensorrt-llm
mfuntowicz Dec 12, 2024
119a40c
misc(ci): update Rust action toolchain
mfuntowicz Dec 12, 2024
7db90f1
misc(ci): let's try to build the Dockerfile for trtllm
mfuntowicz Dec 12, 2024
3f8dc96
misc(ci): provide mecanism to cache inside container
mfuntowicz Dec 12, 2024
0aa49a1
misc(ci): export aws creds as output of step
mfuntowicz Dec 12, 2024
ea7cf3a
misc(ci): let's try this way
mfuntowicz Dec 12, 2024
bdab3bb
misc(ci): again
mfuntowicz Dec 12, 2024
dc34f5a
misc(ci): again
mfuntowicz Dec 12, 2024
f939500
misc(ci): add debug profile
mfuntowicz Dec 12, 2024
2737416
misc(ci): add debug profile
mfuntowicz Dec 12, 2024
b43fe7e
misc(ci): lets actually use sccache ...
mfuntowicz Dec 12, 2024
55c92d0
misc(ci): do not build with ssl enabled
mfuntowicz Dec 12, 2024
88884f9
misc(ci): WAT
mfuntowicz Dec 12, 2024
5fbab27
misc(ci): WAT
mfuntowicz Dec 12, 2024
425f0bf
misc(ci): WAT
mfuntowicz Dec 12, 2024
ba738e2
misc(ci): WAT
mfuntowicz Dec 13, 2024
253116e
misc(ci): WAT
mfuntowicz Dec 13, 2024
5d5524d
misc(backend): test with TGI S3 conf
mfuntowicz Dec 16, 2024
f1986c0
misc(backend): test with TGI S3 conf
mfuntowicz Dec 16, 2024
783a057
misc(backend): once more?
mfuntowicz Dec 16, 2024
71311be
misc(backend): let's try with GHA
mfuntowicz Dec 17, 2024
fd039b6
misc(backend): missing env directive
mfuntowicz Dec 17, 2024
7f9b223
misc(backend): make sure to correctly set IS_GHA_BUILD=true in wf
mfuntowicz Dec 17, 2024
b8d755e
misc(backend): ok let's debug smtg
mfuntowicz Dec 17, 2024
d0108b4
misc(backend): WWWWWWWWWWWWWAAAAAAAA
mfuntowicz Dec 17, 2024
6d4ac29
misc(backend): kthxbye retry s3
mfuntowicz Dec 17, 2024
7337d83
misc(backend): use session token
mfuntowicz Dec 17, 2024
4394a23
misc(backend): add more info
mfuntowicz Dec 17, 2024
b5c62c4
misc(backend): lets try 1h30
mfuntowicz Dec 17, 2024
76239f2
misc(backend): lets try 1h30
mfuntowicz Dec 17, 2024
84ea221
misc(backend): increase to 2h
mfuntowicz Dec 18, 2024
656dc23
misc(backend): lets try...
mfuntowicz Dec 18, 2024
da4bd56
misc(backend): lets try...
mfuntowicz Dec 18, 2024
7a1785f
misc(backend): let's build for ci-runtime
mfuntowicz Dec 19, 2024
aa6a143
misc(backend): let's add some more tooling
mfuntowicz Dec 19, 2024
06fb820
misc(backend): add some tags
mfuntowicz Dec 19, 2024
4aae931
misc(backend): disable Werror for now
mfuntowicz Dec 19, 2024
724e0c1
misc(backend): added automatic gha detection
mfuntowicz Dec 19, 2024
7f6b1f1
misc(backend): remove leak sanitizer which is included in asan
mfuntowicz Dec 20, 2024
d1a9318
misc(backend): forward env
mfuntowicz Dec 20, 2024
ebb3e51
misc(backend): forward env
mfuntowicz Dec 20, 2024
8609f0d
misc(backend): let's try
mfuntowicz Dec 20, 2024
ffbab2c
misc(backend): let's try
mfuntowicz Dec 20, 2024
f226b53
misc(backend): again
mfuntowicz Dec 20, 2024
8bcfe5a
misc(backend): again
mfuntowicz Dec 20, 2024
90bc544
misc(backend): again
mfuntowicz Dec 20, 2024
0383617
misc(backend): again
mfuntowicz Dec 20, 2024
30ed776
misc(backend): again
mfuntowicz Dec 20, 2024
cf7069d
misc(backend): fix sscache -> sccache
mfuntowicz Dec 20, 2024
bb77ae9
misc(backend): fix sscache -> sccache
mfuntowicz Dec 20, 2024
3cd97a3
misc(backend): fix sscache -> sccache
mfuntowicz Dec 20, 2024
2556626
misc(backend): let's actually cache things now
mfuntowicz Dec 20, 2024
62d5ade
misc(backend): let's actually cache things now
mfuntowicz Dec 20, 2024
aa3b2d5
misc(backend): attempt to run the testS?
mfuntowicz Dec 21, 2024
e128922
misc(backend): attempt to run the tests?
mfuntowicz Dec 21, 2024
a0a9534
misc(backend): attempt to run the tests?
mfuntowicz Dec 21, 2024
2d7cd0e
change runner size
glegendre01 Dec 27, 2024
356eff9
fix: Correctly tag docker images (#2878)
Hugoch Jan 6, 2025
7f10191
misc(llamacpp): maybe?
mfuntowicz Jan 6, 2025
d497533
misc(llamacpp): maybe?
mfuntowicz Jan 6, 2025
0c25993
misc(llamacpp): maybe?
mfuntowicz Jan 6, 2025
0dcbe96
misc(ci): gogogo
mfuntowicz Jan 6, 2025
5429a11
misc(ci): gogogo
mfuntowicz Jan 6, 2025
6d1bd37
misc(ci): gogogo
mfuntowicz Jan 6, 2025
77ffe9d
misc(ci): gogogo
mfuntowicz Jan 6, 2025
4c2c3aa
misc(ci): gogogo
mfuntowicz Jan 6, 2025
77cbd65
misc(ci): gogogo
mfuntowicz Jan 6, 2025
c218e3d
misc(ci): go
mfuntowicz Jan 6, 2025
918c3ad
misc(ci): go
mfuntowicz Jan 6, 2025
92cfe43
misc(ci): go
mfuntowicz Jan 6, 2025
dc0fd7b
misc(ci): use bin folder
mfuntowicz Jan 6, 2025
228b3d6
misc(ci): make the wf callable for reuse
mfuntowicz Jan 7, 2025
b7e848e
misc(ci): make the wf callable for reuse (bis)
mfuntowicz Jan 7, 2025
d5224b3
misc(ci): make the wf callable for reuse (bis)
mfuntowicz Jan 7, 2025
4266d9e
misc(ci): give the wf a name
mfuntowicz Jan 7, 2025
3ef5e02
Create test-trtllm.yml
paulinebm Jan 7, 2025
994f0ab
Update test-trtllm.yml
paulinebm Jan 7, 2025
5d3d8c0
Create build-trtllm2
paulinebm Jan 7, 2025
29ac684
Rename build-trtllm2 to 1-build-trtllm2
paulinebm Jan 7, 2025
38f226f
Rename test-trtllm.yml to 1-test-trtllm2.yml
paulinebm Jan 7, 2025
e37b7f7
misc(ci): fw secrets
mfuntowicz Jan 7, 2025
42f0296
Update 1-test-trtllm2.yml
paulinebm Jan 7, 2025
327cb48
Rename 1-build-trtllm2 to 1-build-trtllm2.yml
paulinebm Jan 7, 2025
0d83c00
Update 1-test-trtllm2.yml
paulinebm Jan 7, 2025
9b87d1a
misc(ci): use ci-build.yaml as main dispatcher
mfuntowicz Jan 7, 2025
1736086
Delete .github/workflows/1-test-trtllm2.yml
paulinebm Jan 7, 2025
d47ce65
Delete .github/workflows/1-build-trtllm2.yml
paulinebm Jan 7, 2025
5e8fdd3
misc(ci): rights?
mfuntowicz Jan 7, 2025
b3277a3
misc(ci): rights?
mfuntowicz Jan 7, 2025
77e42c2
misc(ci): once more?
mfuntowicz Jan 7, 2025
f6d5f71
misc(ci): once more?
mfuntowicz Jan 7, 2025
215fad0
misc(ci): baby more time?
mfuntowicz Jan 7, 2025
5a73fe5
misc(ci): baby more time?
mfuntowicz Jan 7, 2025
7da7b38
misc(ci): try the permission above again?
mfuntowicz Jan 8, 2025
e80ce22
misc(ci): try the permission above again?
mfuntowicz Jan 8, 2025
b3fae2f
misc(ci): try the permission scoped again?
mfuntowicz Jan 8, 2025
b35a14b
misc(ci): install tensorrt_llm_executor_static
mfuntowicz Jan 8, 2025
c5aa514
misc(ci): attempt to rebuild with sccache?
mfuntowicz Jan 8, 2025
df9df1d
misc(ci):run the tests on GPU instance
mfuntowicz Jan 8, 2025
a7a7c67
misc(ci): let's actually setup sccache in the build.rs
mfuntowicz Jan 8, 2025
4d875c4
misc(ci): reintroduce variables
mfuntowicz Jan 9, 2025
7bde6d3
misc(ci): enforce sccache
mfuntowicz Jan 9, 2025
d110ab2
misc(ci): correct right job name dependency
mfuntowicz Jan 9, 2025
75b9d82
misc(ci): detect dev profile for debug
mfuntowicz Jan 9, 2025
7a893af
misc(ci): detect gha build
mfuntowicz Jan 9, 2025
1e08e9c
misc(ci): detect gha build
mfuntowicz Jan 9, 2025
a791291
misc(ci): ok debug
mfuntowicz Jan 9, 2025
af6428c
misc(ci): wtf
mfuntowicz Jan 9, 2025
c3f3035
misc(ci): wtf2
mfuntowicz Jan 9, 2025
8f0da40
misc(ci): wtf3
mfuntowicz Jan 10, 2025
3c7710c
misc(ci): use commit HEAD instead of merge commit for image id
mfuntowicz Jan 10, 2025
79469be
misc(ci): wtfinfini
mfuntowicz Jan 10, 2025
0cf0732
misc(ci): wtfinfini
mfuntowicz Jan 10, 2025
0159843
misc(ci): KAMEHAMEHA
mfuntowicz Jan 10, 2025
d969dad
Merge TRTLLM in standard CI
Hugoch Jan 15, 2025
ffb60ff
misc(ci): remove input machine
mfuntowicz Jan 20, 2025
b267df5
misc(ci): missing id-token for AWS auth
mfuntowicz Jan 20, 2025
e083a92
misc(ci): missing id-token for AWS auth
mfuntowicz Jan 20, 2025
87039cf
misc(ci): missing id-token for AWS auth
mfuntowicz Jan 20, 2025
549d7e3
misc(ci): again...
mfuntowicz Jan 20, 2025
c33eeb2
misc(ci): again...
mfuntowicz Jan 20, 2025
0a33615
misc(ci): again...
mfuntowicz Jan 20, 2025
08af269
misc(ci): again...
mfuntowicz Jan 20, 2025
0ed76ba
misc(ci): missing benchmark
mfuntowicz Jan 20, 2025
8c6d972
misc(ci): missing backends
mfuntowicz Jan 20, 2025
debaea4
misc(ci): missing launcher
mfuntowicz Jan 20, 2025
7c9ee56
misc(ci): give everything aws needs
mfuntowicz Jan 20, 2025
d0b8e2e
misc(ci): give everything aws needs
mfuntowicz Jan 20, 2025
edfafeb
misc(ci): fix warnings
mfuntowicz Jan 20, 2025
a4d069f
misc(ci): attempt to fix sccache not building trtllm
mfuntowicz Jan 20, 2025
a0e75b1
misc(ci): attempt to fix sccache not building trtllm again
mfuntowicz Jan 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 53 additions & 1 deletion .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,28 @@ jobs:
group: ${{ github.workflow }}-build-and-push-image-${{ inputs.hardware }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
runs-on:
group: aws-highmemory-32-plus-priv
group: aws-highmemory-64-plus-priv
permissions:
contents: write
packages: write
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4.4.1
- name: Extract TensorRT-LLM version
run: |
echo "TENSORRT_LLM_VERSION=$(grep -oP '([a-z,0-9]{40})' $GITHUB_WORKSPACE/backends/trtllm/cmake/trtllm.cmake)" >> $GITHUB_ENV
echo "TensorRT-LLM version: ${{ env.TENSORRT_LLM_VERSION }}"
- name: "Configure AWS Credentials"
id: aws-creds
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: us-east-1
role-to-assume: ${{ secrets.AWS_ROLE_GITHUB_TGI_TEST }}
role-duration-seconds: 7200
output-credentials: true
- name: Construct harware variables
shell: bash
run: |
Expand All @@ -52,6 +65,7 @@ jobs:
export runs_on="aws-g6-12xl-plus-priv-cache"
export platform=""
export extra_pytest=""
export target="nil"
;;
cuda-trtllm)
export dockerfile="Dockerfile_trtllm"
Expand All @@ -61,6 +75,10 @@ jobs:
export runs_on="ubuntu-latest"
export platform=""
export extra_pytest=""
export target="ci-runtime"
export sccache_s3_key_prefix="trtllm"
export sccache_region="us-east-1"
export build_type="dev"
;;
rocm)
export dockerfile="Dockerfile_amd"
Expand All @@ -71,6 +89,7 @@ jobs:
export runs_on="ubuntu-latest"
export platform=""
export extra_pytest="-k test_flash_gemma_gptq_load"
export target="nil"
;;
intel-xpu)
export dockerfile="Dockerfile_intel"
Expand All @@ -80,6 +99,7 @@ jobs:
export runs_on="ubuntu-latest"
export platform="xpu"
export extra_pytest=""
export target="nil"
;;
intel-cpu)
export dockerfile="Dockerfile_intel"
Expand All @@ -90,6 +110,7 @@ jobs:
export runs_on="aws-highmemory-32-plus-priv"
export platform="cpu"
export extra_pytest="-k test_flash_gemma_simple"
export target="nil"
;;
esac
echo $dockerfile
Expand All @@ -106,6 +127,10 @@ jobs:
echo "RUNS_ON=${runs_on}" >> $GITHUB_ENV
echo "EXTRA_PYTEST=${extra_pytest}" >> $GITHUB_ENV
echo REGISTRY_MIRROR=$REGISTRY_MIRROR >> $GITHUB_ENV
echo "TARGET=${target}" >> $GITHUB_ENV
echo "SCCACHE_S3_KEY_PREFIX=${sccache_s3_key_prefix}" >> $GITHUB_ENV
echo "SCCACHE_REGION=${sccache_region}" >> $GITHUB_ENV
echo "BUILD_TYPE=${build_type}" >> $GITHUB_ENV
- name: Initialize Docker Buildx
uses: docker/setup-buildx-action@v3
with:
Expand Down Expand Up @@ -170,6 +195,14 @@ jobs:
GIT_SHA=${{ env.GITHUB_SHA }}
DOCKER_LABEL=sha-${{ env.GITHUB_SHA_SHORT }}${{ env.LABEL }}
PLATFORM=${{ env.PLATFORM }}
build_type=${{ env.BUILD_TYPE }}
is_gha_build=true
aws_access_key_id=${{ steps.aws-creds.outputs.aws-access-key-id }}
aws_secret_access_key=${{ steps.aws-creds.outputs.aws-secret-access-key }}
aws_session_token=${{ steps.aws-creds.outputs.aws-session-token }}
sccache_bucket=${{ secrets.AWS_S3_BUCKET_GITHUB_TGI_TEST }}
sccache_s3_key_prefix=${{ env.SCCACHE_S3_KEY_PREFIX }}
sccache_region=${{ env.SCCACHE_REGION }}
tags: ${{ steps.meta.outputs.tags || steps.meta-pr.outputs.tags }}
labels: ${{ steps.meta.outputs.labels || steps.meta-pr.outputs.labels }}
cache-from: type=s3,region=us-east-1,bucket=ci-docker-buildx-cache,name=text-generation-inference-cache${{ env.LABEL }},mode=min,access_key_id=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_ACCESS_KEY_ID }},secret_access_key=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_SECRET_ACCESS_KEY }},mode=min
Expand Down Expand Up @@ -215,3 +248,22 @@ jobs:
echo $DOCKER_IMAGE
docker pull $DOCKER_IMAGE
pytest -s -vv integration-tests ${PYTEST_FLAGS} ${EXTRA_PYTEST}

backend_trtllm_cxx_tests:
needs: build-and-push
if: needs.build-and-push.outputs.label == '-trtllm'
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-trtllm-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
runs-on:
group: aws-g6-12xl-plus-priv-cache
container:
image: ${{ needs.build-and-push.outputs.docker_image }}
credentials:
username: ${{ secrets.REGISTRY_USERNAME }}
password: ${{ secrets.REGISTRY_PASSWORD }}
options: --gpus all --shm-size=8g

steps:
- name: Run C++/CUDA tests
run: /usr/local/tgi/bin/tgi_trtllm_backend_tests
1 change: 1 addition & 0 deletions .github/workflows/ci_build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ jobs:
permissions:
contents: write
packages: write
id-token: write
with:
hardware: ${{ matrix.hardware }}
# https://github.com/actions/runner/issues/2206
Expand Down
28 changes: 14 additions & 14 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
[workspace]
members = [
"benchmark",
"backends/v2",
"backends/v3",
"backends/grpc-metadata",
"backends/trtllm",
"launcher",
"router"
"benchmark",
"backends/v2",
"backends/v3",
"backends/grpc-metadata",
"backends/trtllm",
"launcher",
"router"
]
default-members = [
"benchmark",
"backends/v2",
"backends/v3",
"backends/grpc-metadata",
# "backends/trtllm",
"launcher",
"router"
"benchmark",
"backends/v2",
"backends/v3",
"backends/grpc-metadata",
# "backends/trtllm",
"launcher",
"router"
]
resolver = "2"

Expand Down
99 changes: 67 additions & 32 deletions Dockerfile_trtllm
Original file line number Diff line number Diff line change
@@ -1,19 +1,7 @@
ARG CUDA_ARCH_LIST="75-real;80-real;86-real;89-real;90-real"
ARG OMPI_VERSION="4.1.7rc1"

# Build dependencies resolver stage
FROM lukemathwalker/cargo-chef:latest-rust-1.84.0 AS chef
WORKDIR /usr/src/text-generation-inference/backends/trtllm

FROM chef AS planner
COPY Cargo.lock Cargo.lock
COPY Cargo.toml Cargo.toml
COPY rust-toolchain.toml rust-toolchain.toml
COPY router router
COPY benchmark/ benchmark/
COPY backends/ backends/
COPY launcher/ launcher/
RUN cargo chef prepare --recipe-path recipe.json
ARG cuda_arch_list="75-real;80-real;86-real;89-real;90-real"
ARG ompi_version="4.1.7rc1"
ARG build_type=release
ARG is_gha_build=false

# CUDA dependent dependencies resolver stage
FROM nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 AS cuda-builder
Expand All @@ -26,8 +14,11 @@ RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
g++-14 \
git \
git-lfs \
lld \
libssl-dev \
libucx-dev \
libasan8 \
libubsan1 \
ninja-build \
pkg-config \
pipx \
Expand All @@ -43,9 +34,9 @@ ENV TENSORRT_INSTALL_PREFIX=/usr/local/tensorrt

# Install OpenMPI
FROM cuda-builder AS mpi-builder
ARG OMPI_VERSION
ARG ompi_version

ENV OMPI_TARBALL_FILENAME="openmpi-$OMPI_VERSION.tar.bz2"
ENV OMPI_TARBALL_FILENAME="openmpi-$ompi_version.tar.bz2"
RUN wget "https://download.open-mpi.org/release/open-mpi/v4.1/$OMPI_TARBALL_FILENAME" -P /opt/src && \
mkdir /usr/src/mpi && \
tar -xf "/opt/src/$OMPI_TARBALL_FILENAME" -C /usr/src/mpi --strip-components=1 && \
Expand All @@ -65,34 +56,56 @@ RUN chmod +x /opt/install_tensorrt.sh && \
FROM cuda-builder AS tgi-builder
WORKDIR /usr/src/text-generation-inference

# Scoped global args reuse
ARG is_gha_build
ARG build_type

# Install Rust
ENV PATH="/root/.cargo/bin:$PATH"
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | bash -s -- -y && \
chmod -R a+w /root/.rustup && \
chmod -R a+w /root/.cargo
chmod -R a+w /root/.cargo && \
cargo install sccache --locked

# SCCACHE Specifics args - before finding a better, more generic, way...
ARG aws_access_key_id
ARG aws_secret_access_key
ARG aws_session_token
ARG sccache_bucket
ARG sccache_s3_key_prefix
ARG sccache_region

ENV AWS_ACCESS_KEY_ID=$aws_access_key_id
ENV AWS_SECRET_ACCESS_KEY=$aws_secret_access_key
ENV AWS_SESSION_TOKEN=$aws_session_token
ENV SCCACHE_BUCKET=$sccache_bucket
ENV SCCACHE_S3_KEY_PREFIX=$sccache_s3_key_prefix
ENV SCCACHE_REGION=$sccache_region

ENV PATH="/root/.cargo/bin:$PATH"
RUN cargo install cargo-chef

# Cache dependencies
COPY --from=planner /usr/src/text-generation-inference/backends/trtllm/recipe.json .
RUN cargo chef cook --release --recipe-path recipe.json

# Build actual TGI
ARG CUDA_ARCH_LIST
ENV CMAKE_PREFIX_PATH="/usr/local/mpi:/usr/local/tensorrt:$CMAKE_PREFIX_PATH"
ENV LD_LIBRARY_PATH="/usr/local/mpi/lib:$LD_LIBRARY_PATH"
ENV PKG_CONFIG_PATH="/usr/local/mpi/lib/pkgconfig:$PKG_CONFIG_PATH"
ENV CMAKE_PREFIX_PATH="/usr/local/mpi:/usr/local/tensorrt:$CMAKE_PREFIX_PATH"

ENV USE_LLD_LINKER=ON
ENV CUDA_ARCH_LIST=${cuda_arch_list}
ENV IS_GHA_BUILD=${is_gha_build}

COPY Cargo.lock Cargo.lock
COPY Cargo.toml Cargo.toml
COPY rust-toolchain.toml rust-toolchain.toml
COPY router router
COPY backends/trtllm backends/trtllm
COPY backends backends
COPY benchmark benchmark
COPY launcher launcher
COPY --from=trt-builder /usr/local/tensorrt /usr/local/tensorrt
COPY --from=mpi-builder /usr/local/mpi /usr/local/mpi

RUN mkdir $TGI_INSTALL_PREFIX && mkdir "$TGI_INSTALL_PREFIX/include" && mkdir "$TGI_INSTALL_PREFIX/lib" && \
cd backends/trtllm && \
CMAKE_INSTALL_PREFIX=$TGI_INSTALL_PREFIX cargo build --release
python3 backends/trtllm/scripts/setup_sccache.py --is-gha-build ${is_gha_build} && \
CMAKE_INSTALL_PREFIX=$TGI_INSTALL_PREFIX \
RUSTC_WRAPPER=sccache \
cargo build --profile ${build_type} --package text-generation-backends-trtllm --bin text-generation-backends-trtllm && \
sccache --show-stats

FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04 AS runtime
RUN apt update && apt install -y libucx0 pipx python3-minimal python3-dev python3-pip python3-venv && \
Expand All @@ -116,6 +129,28 @@ FROM runtime

LABEL co.huggingface.vendor="Hugging Face Inc."
LABEL org.opencontainers.image.authors="hardware@hf.co"
LABEL org.opencontainers.title="Text-Generation-Inference TensorRT-LLM Backend"

ENTRYPOINT ["./text-generation-launcher"]
CMD ["--executor-worker", "/usr/local/tgi/bin/executorWorker"]

# This is used only for the CI/CD
FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04 AS ci-runtime
RUN apt update && apt install -y libasan8 libubsan1 libucx0 pipx python3-minimal python3-dev python3-pip python3-venv && \
rm -rf /var/lib/{apt,dpkg,cache,log}/ && \
pipx ensurepath && \
pipx install --include-deps transformers tokenizers

WORKDIR /usr/local/tgi/bin

ENV PATH=/root/.local/share/pipx/venvs/transformers/bin/:$PATH
ENV LD_LIBRARY_PATH="/usr/local/tgi/lib:/usr/local/mpi/lib:/usr/local/tensorrt/lib:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH"
ENV TOKENIZERS_PARALLELISM=false
ENV OMPI_MCA_plm_rsh_agent=""

COPY --from=mpi-builder /usr/local/mpi /usr/local/mpi
COPY --from=trt-builder /usr/local/tensorrt /usr/local/tensorrt
COPY --from=tgi-builder /usr/local/tgi /usr/local/tgi

# Basically we copy from target/debug instead of target/release
COPY --from=tgi-builder /usr/src/text-generation-inference/target/debug/text-generation-backends-trtllm /usr/local/tgi/bin/text-generation-launcher
Loading
Loading