Skip to content

Conversation

@michaelfeil
Copy link
Contributor

@michaelfeil michaelfeil commented May 19, 2024

Docker for testing: michaelf34/runpod-infinity-worker:0.0.4

I recently added multi-model deployment:

Adds:

  • pins the docker image
  • Multiple Model deployment works now nativly
  • Starting up the Async Event Loop once (at the first requests) -> Better performance.
  • No more warmups
  • Model path is cached
  • private models run by setting HF_TOKEN
  • Env variables are padded with ; for convenience
  • Optimum (Onnx) / CTranslate2 should also work, but are slightly less performance.
  • fp8 inference is supported, if you rent a L40s or a H100, or MI300x+. Needs nvidia compute capability sm>=89.

Something that could be useful:

  • Each engine has a queue. The .embed adds it to this queue. To handle backpressure, maybe better reject the requests to be added, and give the runpod-serverless runtime the opportunity to retry, potentially hitting a new worker, or scaling to more workers.
  • Something useful would be a "query" bypass -> potentially spawning a second model duplicate on CPU only, that can handle quick queries (that are latency sensitive). Let me know if this is a useful feature, and I try to prioritize this feature.

@michaelfeil michaelfeil mentioned this pull request May 19, 2024
@michaelfeil
Copy link
Contributor Author

@alpayariyak Ready for review / merge.

@alpayariyak
Copy link
Contributor

Incredible work, thank you so much @michaelfeil! Will review shortly

@michaelfeil
Copy link
Contributor Author

@alpayariyak Sorry for pinging, but it would great to merge this PR as is - and add any additional features if needed at a later point in time to not overload this PR

@alpayariyak alpayariyak merged commit f40926b into runpod-workers:main May 30, 2024
@alpayariyak
Copy link
Contributor

Hey @michaelfeil, lmk if there's anything you'd like to see before I cut an official release, but should be all good!

@michaelfeil michaelfeil deleted the runpod-hyperspeed branch May 30, 2024 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants