Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quant] Add FX support in quantization examples #5797

Closed
wants to merge 8 commits into from

Conversation

andrewor14
Copy link

@andrewor14 andrewor14 commented Apr 10, 2022

Stack from ghstack (oldest at bottom):

Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

# ==================== PTQ ====================
# MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d,
# shufflenet_v2_x0_5, shufflenet_v2_x1_0

# eager
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# fx
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# ==================== QAT ====================
# mobilenet_v2 eager
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

# mobilenet_v2 fx
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

# mobilenet_v3_large eager
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

# mobilenet_v3_large fx
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"

Screen Shot 2022-04-26 at 12 35 30 PM

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Apr 10, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 7e77c46493daddd0a9238491a590afab7fa2d209
Pull Request resolved: #5797
@andrewor14 andrewor14 marked this pull request as draft April 10, 2022 03:16
@datumbox datumbox self-requested a review April 10, 2022 07:22
@andrewor14 andrewor14 marked this pull request as ready for review April 11, 2022 03:54
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work @andrewor14. Some initial questions concerning the new API below.

references/classification/train_quantization.py Outdated Show resolved Hide resolved
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Apr 12, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 50b3f482794f42c742e9e61ff3f4fcbf2b040703
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 12, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 7e77c46493daddd0a9238491a590afab7fa2d209
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 12, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 50b3f482794f42c742e9e61ff3f4fcbf2b040703
Pull Request resolved: #5797
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Apr 12, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: ab5e4691b084dc88e501095b19a3be796816267e
Pull Request resolved: #5797
Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! can you include the accuracy numbers in the summary and update the docs that shows accuracy as well

andrewor14 added a commit that referenced this pull request Apr 13, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: ab5e4691b084dc88e501095b19a3be796816267e
Pull Request resolved: #5797
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Apr 13, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 6308d7bf03516d6aefea0d985de8e5486d8751ce
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 13, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: ab5e4691b084dc88e501095b19a3be796816267e
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 13, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 6308d7bf03516d6aefea0d985de8e5486d8751ce
Pull Request resolved: #5797
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```
# MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d,
# shufflenet_v2_x0_5, shufflenet_v2_x1_0, mobilenet_v2, mobilenet_v3_large

# eager
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm" --model="$MODEL"
--weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# fx
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm" --model="$MODEL"
--weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# eager QAT (mobilenet only)
python train_quantization.py --device="cuda" --backend="fbgemm" --model="$MODEL" --epochs=10
--weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# fx QAT (mobilenet only)
python train_quantization.py --device="cuda" --backend="fbgemm" --model="$MODEL" --epochs=10
--weights="IMAGENET1K_V1" --quantization-workflow-type="fx_graph_mode_quantization"
```

Results:
- "Before" column refers to accuracies reported [here](https://github.com/pytorch/vision/blob/main/docs/source/models.rst#quantized-models)
- TODO: Add results for QAT mobilenet after it's done

<img width="641" alt="Screen Shot 2022-04-12 at 10 58 01 PM" src="https://user-images.githubusercontent.com/2133137/163091177-e1c1c666-c3f7-40c3-8866-c0743c264721.png">

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Apr 13, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well. TODO: provide accuracy comparison.

Test Plan:

python train_quantization.py
  --device='cpu'
  --post-training-quantize
  --backend='fbgemm'
  --model='$MODEL'

model: $MODEL is one of googlenet, inception_v3, resnet18, resnet50,
resnext101_32x8d, shufflenet_v2_x0_5 and shufflenet_v2_x1_0

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 037e9994bfc2e418996702e35a94aaa2a5fa7e36
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 14, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 6308d7bf03516d6aefea0d985de8e5486d8751ce
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 14, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 6308d7bf03516d6aefea0d985de8e5486d8751ce
Pull Request resolved: #5797
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```
# ==================== PTQ ====================
# MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d,
# shufflenet_v2_x0_5, shufflenet_v2_x1_0

# eager
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# fx
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# ==================== QAT ====================
# mobilenet_v2 eager
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

# mobilenet_v2 fx
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

# mobilenet_v3_large eager
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

# mobilenet_v3_large fx
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Apr 14, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: d3a52914d498170dbedeb27eee58c23ec7052a70
Pull Request resolved: #5797
@andrewor14
Copy link
Author

There was a problem with the way I set up the FX experiments. After fixing this problem I uncovered a new blocking issue that I summarized here pytorch/pytorch#75825. I will rerun all the experiments after we fix that issue.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I've added some additional comments and questions. Let me know your thoughts.

references/classification/train_quantization.py Outdated Show resolved Hide resolved
references/classification/train_quantization.py Outdated Show resolved Hide resolved
docs/source/models.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just marking as "Request changes" to avoid accidental merges while we clarify some remaining questions.

andrewor14 added a commit that referenced this pull request Apr 21, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 6308d7bf03516d6aefea0d985de8e5486d8751ce
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 21, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 6308d7bf03516d6aefea0d985de8e5486d8751ce
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 21, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 6308d7bf03516d6aefea0d985de8e5486d8751ce
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 21, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 6308d7bf03516d6aefea0d985de8e5486d8751ce
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 21, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 6308d7bf03516d6aefea0d985de8e5486d8751ce
Pull Request resolved: #5797
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```
# ==================== PTQ ====================
# MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d,
# shufflenet_v2_x0_5, shufflenet_v2_x1_0

# eager
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# fx
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# ==================== QAT ====================
# mobilenet_v2 eager
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

# mobilenet_v2 fx
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

# mobilenet_v3_large eager
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

# mobilenet_v3_large fx
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Apr 21, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: d3a52914d498170dbedeb27eee58c23ec7052a70
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 21, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 166a8e86a8e5b894b88ad9eea075e078c3e00472
Pull Request resolved: #5797
andrewor14 added a commit that referenced this pull request Apr 22, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: d3a52914d498170dbedeb27eee58c23ec7052a70
Pull Request resolved: #5797
@andrewor14
Copy link
Author

Hi @datumbox, continuing our discussion here:

What steps if any do I have to take to load PTQ/QAT weights?

I'm trying to figure out what are the steps for loading pre-trained quantized weights to a model. Can this be done directly to the original model without any conversions? Or some FX convert call is needed prior loading?

For the test_only mode, we use quantized weights so convert must happen before loading the weights. For PTQ and QAT, however, we use pretrained FP32 weights so convert must happen after loading the weights. This is the same for eager mode and FX graph mode. In some more detail, here is the behavior today (pseudocode):

# PTQ / QAT eager
weights = get_pretrained_fp32_weights()
model = torchvision.models.quantization.QuantizableResNet50(...)
model.load_state_dict(weights.get_state_dict())
model = prepare(model) # or prepare_qat in QAT mode
calibrate(model, data) # or train in QAT mode
model = convert(model)
evaluate(model)

# test_only eager
model = torchvision.models.quantization.QuantizableResNet50(...)
model = prepare(model)
calibrate(model, dummy_data)
model = convert(model)
weights = get_quantized_weights()
model.load_state_dict(weights.get_state_dict())
evaluate(model)

What we want is the equivalent for FX. The only differences here are (1) We use prepare_fx (and the like) instead of prepare, and (2) We use the unmodified FP32 model (e.g. torchvision.models.ResNet50) instead of the model with manually inserted quant and dequant stubs (e.g. torchvision.models.quantization.QuantizableResNet50).

andrewor14 added a commit that referenced this pull request Apr 22, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 166a8e86a8e5b894b88ad9eea075e078c3e00472
Pull Request resolved: #5797
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```
# ==================== PTQ ====================
# MODEL is one of googlenet, inception_v3, resnet18, resnet50, resnext101_32x8d,
# shufflenet_v2_x0_5, shufflenet_v2_x1_0

# eager
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# fx
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

# ==================== QAT ====================
# mobilenet_v2 eager
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

# mobilenet_v2 fx
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

# mobilenet_v3_large eager
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

# mobilenet_v3_large fx
python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

<img width="638" alt="Screen Shot 2022-04-21 at 8 33 41 PM" src="https://user-images.githubusercontent.com/2133137/164572469-5848c86b-0813-42f6-bcb7-0298ff4bb25b.png">

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Apr 22, 2022
Summary: Previously, the quantization examples use only eager
mode quantization. This commit adds support for FX mode
quantization as well.

Test Plan:

```
python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cpu" --post-training-quantize --backend="fbgemm"\
  --model="$MODEL" --weights="IMAGENET1K_V1" --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v2"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.0001 --weight-decay=0.0001\
  --quantization-workflow-type="fx_graph_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="eager_mode_quantization"

python train_quantization.py --device="cuda" --backend="qnnpack" --model="mobilenet_v3_large"\
  --epochs=10 --workers=64 --weights="IMAGENET1K_V1" --lr=0.001 --weight-decay=0.00001\
  --quantization-workflow-type="fx_graph_mode_quantization"
```

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo

ghstack-source-id: 20dba5eb0dc6e0e0ad3112ec05b9b8ecc0700ff6
Pull Request resolved: #5797
ResNet 50 75.802 92.764
ResNext 101 32x8d 79.020 94.468
Inception V3 77.206 93.576
GoogleNet 69.702 89.388
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that we need to estimate these values with batch-size=1 on 1 GPU to avoid variances introduced due to batch padding (see #4559).

@datumbox
Copy link
Contributor

@andrewor14 Thanks a lot for your work on this. Let's discuss the design doc you shared with me offline to decide on the approach and minimize throw away work. :)

@andrewor14
Copy link
Author

As discussed offline, I'm closing this PR for now since the FX graph mode quantization API is subject to change. I will reopen this PR again and rerun all the experiments once the new API is ready. Thank you everyone for your comments so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants