Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tensorexpr][nnc] Support quantization #66676

Closed
wants to merge 30 commits into from

Conversation

IvanKobzarev
Copy link
Contributor

@IvanKobzarev IvanKobzarev commented Oct 15, 2021

Stack from ghstack:

Differential Revision: D31676329

  1. Introuducing quantized dtypes to tensorexpr: QUInt8, QInt8

  2. Introducing quantization lowerings for:

aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d

quantize_per_tensor and dequantize can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:

#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"

@pytorch-probot
Copy link

pytorch-probot bot commented Oct 15, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/ff4c2039743041056ebf9a21022a7d7f27a3a8d1/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-py3-clang5-mobile-code-analysis ciflow/all, ciflow/linux, ciflow/mobile 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Oct 15, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit ff4c203 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 15, 2021
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@IvanKobzarev IvanKobzarev mentioned this pull request Oct 18, 2021
Copy link
Contributor

@navahgar navahgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D31676329](https://our.internmc.facebook.com/intern/diff/D31676329)
1. Introuducing quantized dtypes to tensorexpr:  QUInt8, QInt8

2. Introducing quantization lowerings for:
```
aten::quantize_per_tensor
aten::dequantize
_quantized::conv2d
quantized::conv2d_relu
quantized::add
aten::upsample_nearest2d
```

`quantize_per_tensor` and `dequantize` can be lowered as tensorexpr or external calls.
all other ops are lowered only as external calls.

Testing:
```
#!/bin/bash
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>eval" \
PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 \
USE_PYTORCH_QNNPACK=1 \
USE_QNNPACK=1 \
USE_FBGEMM=0 \
DEBUG=1 \
CXX_FLAGS="-g" \
 USE_LLVM=/home/ivankobzarev/llvm90install \
USE_XNNPACK=1 \
./scripts/build_local.sh \
  -DBUILD_BINARY=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TEST=ON \
  -DUSE_LLVM=/home/ivankobzarev/llvm90install \
  && ./build/bin/test_tensorexpr --gtest_filter="Quantization.*"
```

[ghstack-poisoned]
@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@IvanKobzarev merged this pull request in 7fbcf79.

@facebook-github-bot facebook-github-bot deleted the gh/ivankobzarev/78/head branch November 4, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants