Skip to content

Latest commit

 

History

History
100 lines (56 loc) · 2.58 KB

index.rst

File metadata and controls

100 lines (56 loc) · 2.58 KB

QONNX

Note

QONNX is currently under active development. APIs will likely change.

QONNX (Quantized ONNX) introduces three new custom operators -- Quant, BipolarQuant and Trunc -- in order to represent arbitrary-precision uniform quantization in ONNX. This enables:

  • Representation of binary, ternary, 3-bit, 4-bit, 6-bit or any other quantization.
  • Quantization is an operator itself, and can be applied to any parameter or layer input.
  • Flexible choices for scaling factor and zero-point granularity.
  • Quantized values are carried using standard float datatypes to remain ONNX protobuf-compatible.

This repository contains a set of Python utilities to work with QONNX models, including but not limited to:

  • executing QONNX models for (slow) functional verification
  • shape inference, constant folding and other basic optimizations
  • summarizing the inference cost of a QONNX model in terms of mixed-precision MACs, parameter and activation volume
  • Python infrastructure for writing transformations and defining executable, shape-inferencable custom ops
  • (experimental) data layout conversion from standard ONNX NCHW to custom QONNX NHWC ops

Quickstart

Operator definitions

  • Quant for 2-to-arbitrary-bit quantization, with scaling and zero-point
  • BipolarQuant for 1-bit (bipolar) quantization, with scaling and zero-point
  • Trunc for truncating to a specified number of bits, with scaling and zero-point

Installation

Install latest release from PyPI:

pip install qonnx

Development

Install in editable mode in a venv:

git clone https://github.com/fastmachinelearning/qonnx
cd qonnx
virtualenv -p python3.8 venv
source venv/bin/activate
pip install -e .[testing, docs, notebooks]

Test suite

Run entire test suite, parallelized across CPU cores:

pytest -n auto --verbose

Run a particular test and fall into pdb if it fails:

pytest --pdb -k "test*extend*partition.py::test*extend*partition[extend_id1-2]"

QONNX also uses GitHub actions to run the full test suite on PRs.

.. toctree::
   :maxdepth: 2
   :hidden:

   ONNX-Based Compiler Infrastructure <overview>
   Tutorials <tutorials>
   API <api/modules>
   License <license>
   Contributors <authors>
   Index <genindex>