Configuring number of initial workers via torchserve

### 🚀 The feature

Optional command line argument for `torchserve` to configure number of initial workers. If the argument is not supplied, then proceed with autoscaling.

### Motivation, pitch

**This would make rapid experimentation of model serving more pleasant and seamless on less-memory-capable machines.**

_Currently_, as noted in the [Getting Started docs](https://github.com/pytorch/serve/blob/master/docs/getting_started.md#start-torchserve-to-serve-the-model), running TorchServe "automatically scales backend workers." This is a neat feature, but it creates a pain point for folks trying to run TorchServe on their laptop or otherwise less-memory-capable machine.

**For example:**
I ran TorchServe on my laptop (M2 Mac, 32 GB RAM, 10 core) to serve an embedding model (~4 GB). The autoscaling attempted to spawn 10 workers and it predictably crashed my laptop. Colleague of mine experienced the same thing as well. Ultimately, I had to use the Management API endpoints to (1) start the server, (2) register the model, (3) scale to 1 worker, before testing out served inference. 

The simplicity of just calling torchserve to startup the server and initialize a worker is basically out of reach for anyone experimenting with a regular laptop.

### Alternatives

Currently, as noted in the [Getting Started docs](https://github.com/pytorch/serve/blob/master/docs/getting_started.md#start-torchserve-to-serve-the-model), it's possible to use the fine-grained control offered by the Management API endpoints to (1) start the server, (2) register the model, (3) scale to 1 worker, before testing out served inference. However, as mentioned above, I wish I could just use the simple `torchserve` command on my laptop 🥲.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configuring number of initial workers via torchserve #2432

🚀 The feature

Motivation, pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Configuring number of initial workers via torchserve #2432

Description

🚀 The feature

Motivation, pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions