Skip to content

load initial handlers in parallel #1928

Open
@hgong-snap

Description

@hgong-snap

🚀 The feature

it seems that initial handlers are loaded sequentially for different models(handlers for same model are loaded in parallel though). When serving many models in production, this will significantly slowdown the new server spinning up. If it is possible to load all handlers in parallel? e.g. for a 32 core machine, on server startup, ideally we should process 32 workers in parallel in startup. This will dramatically decrease the startup time and can scale up better during traffic surge.

Motivation, pitch

see above

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestfuturejavaPull requests that update Java codeoptimizationp1mid priorityperfPerformance issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions