Skip to content

Configuring number of initial workers via torchserve #2432

Open
@sidharthrajaram

Description

@sidharthrajaram

🚀 The feature

Optional command line argument for torchserve to configure number of initial workers. If the argument is not supplied, then proceed with autoscaling.

Motivation, pitch

This would make rapid experimentation of model serving more pleasant and seamless on less-memory-capable machines.

Currently, as noted in the Getting Started docs, running TorchServe "automatically scales backend workers." This is a neat feature, but it creates a pain point for folks trying to run TorchServe on their laptop or otherwise less-memory-capable machine.

For example:
I ran TorchServe on my laptop (M2 Mac, 32 GB RAM, 10 core) to serve an embedding model (~4 GB). The autoscaling attempted to spawn 10 workers and it predictably crashed my laptop. Colleague of mine experienced the same thing as well. Ultimately, I had to use the Management API endpoints to (1) start the server, (2) register the model, (3) scale to 1 worker, before testing out served inference.

The simplicity of just calling torchserve to startup the server and initialize a worker is basically out of reach for anyone experimenting with a regular laptop.

Alternatives

Currently, as noted in the Getting Started docs, it's possible to use the fine-grained control offered by the Management API endpoints to (1) start the server, (2) register the model, (3) scale to 1 worker, before testing out served inference. However, as mentioned above, I wish I could just use the simple torchserve command on my laptop 🥲.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    debuggingtriagedIssue has been reviewed and triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions