forked from mesolitica/vllm-whisper
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[DOC] add setup document to support neuron backend (vllm-project#2777)
- Loading branch information
Showing
2 changed files
with
136 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
.. _installation_neuron: | ||
|
||
Installation with Neuron | ||
======================== | ||
|
||
vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK. | ||
At the moment Paged Attention is not supported in Neuron SDK, but naive continuous batching is supported in transformers-neuronx. | ||
Data types currently supported in Neuron SDK are FP16 and BF16. | ||
|
||
Requirements | ||
------------ | ||
|
||
* OS: Linux | ||
* Python: 3.8 -- 3.11 | ||
* Accelerator: NeuronCore_v2 (in trn1/inf2 instances) | ||
* Pytorch 2.0.1/2.1.1 | ||
* AWS Neuron SDK 2.16/2.17 (Verified on python 3.8) | ||
|
||
Installation steps: | ||
|
||
- :ref:`Build from source <build_from_source_neuron>` | ||
|
||
- :ref:`Step 0. Launch Trn1/Inf2 instances <launch_instances>` | ||
- :ref:`Step 1. Install drivers and tools <install_drivers>` | ||
- :ref:`Step 2. Install transformers-neuronx and its dependencies <install_tnx>` | ||
- :ref:`Step 3. Install vLLM from source <install_vllm>` | ||
|
||
.. _build_from_source_neuron: | ||
|
||
Build from source | ||
----------------- | ||
|
||
Following instructions are applicable to Neuron SDK 2.16 and beyond. | ||
|
||
.. _launch_instances: | ||
|
||
Step 0. Launch Trn1/Inf2 instances | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Here are the steps to launch trn1/inf2 instances, in order to install `PyTorch Neuron ("torch-neuronx") Setup on Ubuntu 22.04 LTS <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html>`_. | ||
|
||
- Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. | ||
- To get more information about instances sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_ | ||
- Select Ubuntu Server 22.04 TLS AMI | ||
- When launching a Trn1/Inf2, please adjust your primary EBS volume size to a minimum of 512GB. | ||
- After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance | ||
|
||
.. _install_drivers: | ||
|
||
Step 1. Install drivers and tools | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The installation of drivers and tools wouldn't be necessary, if `Deep Learning AMI Neuron <https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html>`_ is installed. In case the drivers and tools are not installed on the operating system, follow the steps below: | ||
|
||
.. code-block:: console | ||
# Configure Linux for Neuron repository updates | ||
. /etc/os-release | ||
sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF | ||
deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main | ||
EOF | ||
wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add - | ||
# Update OS packages | ||
sudo apt-get update -y | ||
# Install OS headers | ||
sudo apt-get install linux-headers-$(uname -r) -y | ||
# Install git | ||
sudo apt-get install git -y | ||
# install Neuron Driver | ||
sudo apt-get install aws-neuronx-dkms=2.* -y | ||
# Install Neuron Runtime | ||
sudo apt-get install aws-neuronx-collectives=2.* -y | ||
sudo apt-get install aws-neuronx-runtime-lib=2.* -y | ||
# Install Neuron Tools | ||
sudo apt-get install aws-neuronx-tools=2.* -y | ||
# Add PATH | ||
export PATH=/opt/aws/neuron/bin:$PATH | ||
.. _install_tnx: | ||
|
||
Step 2. Install transformers-neuronx and its dependencies | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
`transformers-neuronx <https://github.com/aws-neuron/transformers-neuronx>`_ will be the backend to support inference on trn1/inf2 instances. | ||
Follow the steps below to install transformer-neuronx package and its dependencies. | ||
|
||
.. code-block:: console | ||
# Install Python venv | ||
sudo apt-get install -y python3.10-venv g++ | ||
# Create Python venv | ||
python3.10 -m venv aws_neuron_venv_pytorch | ||
# Activate Python venv | ||
source aws_neuron_venv_pytorch/bin/activate | ||
# Install Jupyter notebook kernel | ||
pip install ipykernel | ||
python3.10 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name "Python (torch-neuronx)" | ||
pip install jupyter notebook | ||
pip install environment_kernels | ||
# Set pip repository pointing to the Neuron repository | ||
python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com | ||
# Install wget, awscli | ||
python -m pip install wget | ||
python -m pip install awscli | ||
# Update Neuron Compiler and Framework | ||
python -m pip install --upgrade neuronx-cc==2.* --pre torch-neuronx==2.1.* torchvision transformers-neuronx | ||
.. _install_vllm: | ||
|
||
Step 3. Install vLLM from source | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Once neuronx-cc and transformers-neuronx packages are installed, we will be able to install vllm as follows: | ||
|
||
.. code-block:: console | ||
$ cd vllm | ||
$ pip install -U -r requirements-neuron.txt | ||
$ pip install . | ||
If neuron packages are detected correctly in the installation process, ``vllm-0.3.0+neuron212`` will be installed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters