Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.
/ nm-vllm Public archive
forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

License

Notifications You must be signed in to change notification settings

neuralmagic/nm-vllm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nm-vllm

THIS REPO HAS BEEN ARCHIVED AS OF SEPTEMBER 2024. NEURAL MAGIC IS STILL RELEASING ENTERPRISE PACKAGES RELATED TO VLLM. OUR RELEASE REPO HAS JUST GONE PRIVATE.

To learn more about nm-vllm Enterprise, visit the nm-vllm product page.

To contribute and to see our contributions to vLLM, visit vLLM.

To view the latest releases, benchmarking, models, and evaluations from Neural Magic, visit nm-vllm-certs.

Neural Magic maintains a variety of optimized models on our Hugging Face organization profiles:

About

A high-throughput and memory-efficient inference and serving engine for LLMs

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Python 70.0%
  • Jupyter Notebook 15.2%
  • Cuda 10.9%
  • C++ 2.7%
  • Shell 0.6%
  • CMake 0.4%
  • Other 0.2%