Skip to content

3.X API installation update #1935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions .azure-pipelines/scripts/install_nc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,6 @@ elif [[ $1 = *"3x_tf"* ]]; then
python -m pip install --no-cache-dir -r requirements_tf.txt
python setup.py tf bdist_wheel
pip install dist/neural_compressor*.whl --force-reinstall
elif [[ $1 = *"3x_ort" ]]; then
python -m pip install --no-cache-dir -r requirements_ort.txt
python setup.py ort bdist_wheel
pip install dist/neural_compressor*.whl --force-reinstall
else
python -m pip install --no-cache-dir -r requirements.txt
python setup.py bdist_wheel
Expand Down
15 changes: 0 additions & 15 deletions .azure-pipelines/scripts/ut/3x/coverage.3x_ort

This file was deleted.

35 changes: 0 additions & 35 deletions .azure-pipelines/scripts/ut/3x/run_3x_ort.sh

This file was deleted.

109 changes: 0 additions & 109 deletions .azure-pipelines/ut-3x-ort.yml

This file was deleted.

13 changes: 0 additions & 13 deletions .github/checkgroup.yml
Original file line number Diff line number Diff line change
Expand Up @@ -140,16 +140,3 @@ subprojects:
- "UT-3x-Torch (Coverage Compare CollectDatafiles)"
- "UT-3x-Torch (Unit Test 3x Torch Unit Test 3x Torch)"
- "UT-3x-Torch (Unit Test 3x Torch baseline Unit Test 3x Torch baseline)"

- id: "Unit Tests 3x-ONNXRT workflow"
paths:
- "neural_compressor/common/**"
- "neural_compressor/onnxrt/**"
- "test/3x/onnxrt/**"
- "setup.py"
- "requirements_ort.txt"
checks:
- "UT-3x-ONNXRT"
- "UT-3x-ONNXRT (Coverage Compare CollectDatafiles)"
- "UT-3x-ONNXRT (Unit Test 3x ONNXRT Unit Test 3x ONNXRT)"
- "UT-3x-ONNXRT (Unit Test 3x ONNXRT baseline Unit Test 3x ONNXRT baseline)"
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,25 @@ Intel® Neural Compressor aims to provide popular model compression techniques s
as well as Intel extensions such as [Intel Extension for TensorFlow](https://github.com/intel/intel-extension-for-tensorflow) and [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch).
In particular, the tool provides the key features, typical examples, and open collaborations as below:

* Support a wide range of Intel hardware such as [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html), [Intel Xeon CPU Max Series](https://www.intel.com/content/www/us/en/products/details/processors/xeon/max-series.html), [Intel Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/flex-series.html), and [Intel Data Center GPU Max Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html) with extensive testing; support AMD CPU, ARM CPU, and NVidia GPU through ONNX Runtime with limited testing
* Support a wide range of Intel hardware such as [Intel Gaudi Al Accelerators](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html), [Intel Core Ultra Processors](https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html), [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html), [Intel Xeon CPU Max Series](https://www.intel.com/content/www/us/en/products/details/processors/xeon/max-series.html), [Intel Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/flex-series.html), and [Intel Data Center GPU Max Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html) with extensive testing;
support AMD CPU, ARM CPU, and NVidia GPU through ONNX Runtime with limited testing; support NVidia GPU for some WOQ algorithms like AutoRound and HQQ.

* Validate popular LLMs such as [LLama2](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm), [Falcon](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm), [GPT-J](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm), [Bloom](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm), [OPT](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm), and more than 10,000 broad models such as [Stable Diffusion](/examples/pytorch/nlp/huggingface_models/text-to-image/quantization), [BERT-Large](/examples/pytorch/nlp/huggingface_models/text-classification/quantization/ptq_static/fx), and [ResNet50](/examples/pytorch/image_recognition/torchvision_models/quantization/ptq/cpu/fx) from popular model hubs such as [Hugging Face](https://huggingface.co/), [Torch Vision](https://pytorch.org/vision/stable/index.html), and [ONNX Model Zoo](https://github.com/onnx/models#models), with automatic [accuracy-driven](/docs/source/design.md#workflow) quantization strategies

* Collaborate with cloud marketplaces such as [Google Cloud Platform](https://console.cloud.google.com/marketplace/product/bitnami-launchpad/inc-tensorflow-intel?project=verdant-sensor-286207), [Amazon Web Services](https://aws.amazon.com/marketplace/pp/prodview-yjyh2xmggbmga#pdp-support), and [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/bitnami.inc-tensorflow-intel), software platforms such as [Alibaba Cloud](https://www.intel.com/content/www/us/en/developer/articles/technical/quantize-ai-by-oneapi-analytics-on-alibaba-cloud.html), [Tencent TACO](https://new.qq.com/rain/a/20221202A00B9S00) and [Microsoft Olive](https://github.com/microsoft/Olive), and open AI ecosystem such as [Hugging Face](https://huggingface.co/blog/intel), [PyTorch](https://pytorch.org/tutorials/recipes/intel_neural_compressor_for_pytorch.html), [ONNX](https://github.com/onnx/models#models), [ONNX Runtime](https://github.com/microsoft/onnxruntime), and [Lightning AI](https://github.com/Lightning-AI/lightning/blob/master/docs/source-pytorch/advanced/post_training_quantization.rst)

## What's New
* [2024/07] From 3.0 release, framework extension API is recommended to be used for quantization.
* [2024/07] Performance optimizations and usability improvements on [client-side](https://github.com/intel/neural-compressor/blob/master/docs/3x/client_quant.md).
* [2024/03] A new SOTA approach [AutoRound](https://github.com/intel/auto-round) Weight-Only Quantization on [Intel Gaudi2 AI accelerator](https://habana.ai/products/gaudi2/) is available for LLMs.

## Installation

### Install from pypi
```Shell
pip install neural-compressor
# Install 2.X API + Framework extension API + PyTorch dependency
pip install neural-compressor[pt]
# Install 2.X API + Framework extension API + TensorFlow dependency
pip install neural-compressor[tf]
```
> **Note**:
> Further installation methods can be found under [Installation Guide](https://github.com/intel/neural-compressor/blob/master/docs/source/installation_guide.md). check out our [FAQ](https://github.com/intel/neural-compressor/blob/master/docs/source/faq.md) for more details.
Expand Down
57 changes: 31 additions & 26 deletions docs/source/installation_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,28 +29,28 @@ The following prerequisites and requirements must be satisfied for a successful

### Install from Binary
- Install from Pypi
```Shell
# install stable basic version from pypi
pip install neural-compressor
```
```Shell
# [Experimental] install stable basic + PyTorch framework extension API from pypi
pip install neural-compressor[pt]
```
```Shell
# [Experimental] install stable basic + TensorFlow framework extension API from pypi
pip install neural-compressor[tf]
```

- Install from test Pypi
```Shell
# install nightly version
git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
pip install -r requirements.txt
# install nightly basic version from pypi
pip install -i https://test.pypi.org/simple/ neural-compressor
```
```Shell
# Install 2.X API + Framework extension API + PyTorch dependency
pip install neural-compressor[pt]
```
```Shell
# Install 2.X API + Framework extension API + TensorFlow dependency
pip install neural-compressor[tf]
```
```Shell
# Install 2.X API + Framework extension API
# With this install CMD, some dependencies for framework extension API not installed,
# you can install them separately by `pip install -r requirements_pt.txt` or `pip install -r requirements_tf.txt`.
pip install neural-compressor
```
```Shell
# Framework extension API + TensorFlow dependency
pip install neural-compressor-pt
```
```Shell
# Framework extension API + TensorFlow dependency
pip install neural-compressor-tf
```

### Install from Source

Expand All @@ -76,15 +76,20 @@ The AI Kit is distributed through many common channels, including from Intel's w
## System Requirements

### Validated Hardware Environment

#### Intel® Neural Compressor supports HPUs based on heterogeneous architecture with two compute engines (MME and TPC):
* Intel Gaudi Al Accelerators (Gaudi2)

#### Intel® Neural Compressor supports CPUs based on [Intel 64 architecture or compatible processors](https://en.wikipedia.org/wiki/X86-64):

* Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, Ice Lake, and Sapphire Rapids)
* Intel Xeon CPU Max Series (formerly Sapphire Rapids HBM)
* Intel Xeon Scalable processor (Skylake, Cascade Lake, Cooper Lake, Ice Lake, and Sapphire Rapids)
* Intel Xeon CPU Max Series (Sapphire Rapids HBM)
* Intel Core Ultra Processors (Meteor Lake)

#### Intel® Neural Compressor supports GPUs built on Intel's Xe architecture:

* Intel Data Center GPU Flex Series (formerly Arctic Sound-M)
* Intel Data Center GPU Max Series (formerly Ponte Vecchio)
* Intel Data Center GPU Flex Series (Arctic Sound-M)
* Intel Data Center GPU Max Series (Ponte Vecchio)

#### Intel® Neural Compressor quantized ONNX models support multiple hardware vendors through ONNX Runtime:

Expand Down
56 changes: 0 additions & 56 deletions neural_compressor/onnxrt/__init__.py

This file was deleted.

22 changes: 0 additions & 22 deletions neural_compressor/onnxrt/algorithms/__init__.py

This file was deleted.

17 changes: 0 additions & 17 deletions neural_compressor/onnxrt/algorithms/layer_wise/__init__.py

This file was deleted.

Loading
Loading