Slow model loading time for CoreML quantized model

### 🐛 Describe the bug

Get https://github.com/pytorch/executorch/pull/5710 and run 
```
python executorch.examples.apple.coreml.scripts.export -m resnet18 --quantize
```
The FP32 model runs fully resident on ANE at 0.9ms on average and 11.13ms cold-start (first inference).
The int8 quantized model runs also fully resident on ANE at 0.54ms on average and 3.10 ms cold-start. Also looking at the layers, looks like there is a lot of quantize followed immediately by dequantize.



### Versions

```PyTorch version: 2.5.0.dev20240618
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.0 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.3)
CMake version: version 3.29.2
Libc version: N/A

Python version: 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-15.0-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] executorch==0.4.0a0+7047162
[pip3] flake8==6.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.5.0
[pip3] torch==2.5.0.dev20240618
[pip3] torchaudio==2.4.0.dev20240618
[pip3] torchsr==1.0.4
[pip3] torchvision==0.20.0.dev20240618
[conda] executorch                0.4.0a0+7047162          pypi_0    pypi
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] numpydoc                  1.5.0           py311hca03da5_0
[conda] torch                     2.4.0a0+gitae81855           dev_0    <develop>
[conda] torchaudio                2.4.0.dev20240618          pypi_0    pypi
[conda] torchsr                   1.0.4                    pypi_0    pypi
[conda] torchvision               0.20.0.dev20240618          pypi_0    pypi
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow model loading time for CoreML quantized model #5718

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slow model loading time for CoreML quantized model #5718

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions