Skip to content

huggingface/optimum-onnx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤗 Optimum ONNX

Export your Hugging Face models to ONNX

Documentation | ONNX | Hub

Installation

Before you begin, make sure you install all necessary libraries by running:

pip install "optimum-onnx[onnxruntime]"@git+https://github.com/huggingface/optimum-onnx.git

If you want to use the GPU version of ONNX Runtime, make sure the CUDA and cuDNN requirements are satisfied, and install the additional dependencies by running :

pip install "optimum-onnx[onnxruntime-gpu]"@git+https://github.com/huggingface/optimum-onnx.git

To avoid conflicts between onnxruntime and onnxruntime-gpu, make sure the package onnxruntime is not installed by running pip uninstall onnxruntime prior to installing Optimum.

ONNX export

It is possible to export 🤗 Transformers, Diffusers, Timm and Sentence Transformers models to the ONNX format and perform graph optimization as well as quantization easily:

optimum-cli export onnx --model meta-llama/Llama-3.2-1B onnx_llama/

The model can also be optimized and quantized with onnxruntime.

For more information on the ONNX export, please check the documentation.

Inference

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using ONNX Runtime in the backend:

  from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM

- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
  result = pipe("He never went out without a book under his arm")

More details on how to run ONNX models with ORTModelForXXX classes here.

About

🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7