Skip to content

Allow for multiple Inferentia ASICs to be assigned to each API replica #1123

Closed
@RobertLucian

Description

@RobertLucian

Description

Remove the current limitation of 1 Inferentia ASIC per API replica. We're currently forced to go with only one because of some issue in the Neuron RTD.

Motivation

It will allow to partition models across multiple Inferentia ASICs.

Additional context

As reported in aws-neuron/aws-neuron-sdk#110.

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactorImprove code quality

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions