Closed
Description
Description
Remove the current limitation of 1 Inferentia ASIC per API replica. We're currently forced to go with only one because of some issue in the Neuron RTD.
Motivation
It will allow to partition models across multiple Inferentia ASICs.
Additional context
As reported in aws-neuron/aws-neuron-sdk#110.