NeuronX TGI is distributed as docker images for EC2 and SageMaker.
These docker images integrate:
- the AWS Neuron SDK for Inferentia2,
- the Text Generation Inference launcher and scheduling front-end,
- a neuron specific inference server for text-generation.
Please refer to the official documentation.
The image must be built from the top directory
make neuronx-tgi