Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantization-aware training in classification #1729

Open
robotcator opened this issue Jan 8, 2020 · 6 comments
Open

quantization-aware training in classification #1729

robotcator opened this issue Jan 8, 2020 · 6 comments
Assignees
Labels
module: models.quantization Issues related to the quantizable/quantized models

Comments

@robotcator
Copy link

Hi, I want to reproduce the result of the quantization awareness training of mobilenet_v2 using this script.

1: It seems that the script will raise launch error when using multi GPU. Do we support multi-GPU when quantization awareness training?
2: Also the readme says that 'Training converges at about 10 epochs.', it seems that after 10 epochs, the test result can not achieve 'acc@top1 71.6' as the pretrained model hosted in hub.

@robotcator robotcator changed the title quantization-aware training in classification [Questions]quantization-aware training in classification Jan 9, 2020
@robotcator
Copy link
Author

cc @fmassa

@robotcator robotcator changed the title [Questions]quantization-aware training in classification quantization-aware training in classification Jan 12, 2020
@robotcator
Copy link
Author

pytorch/pytorch#32082

@jerryzh168
Copy link
Contributor

cc @raghuramank100

@jerryzh168 jerryzh168 added module: models.quantization Issues related to the quantizable/quantized models and removed module: models.quantization Issues related to the quantizable/quantized models labels Jan 13, 2020
@fmassa
Copy link
Member

fmassa commented Jan 15, 2020

@robotcator can you clarify a few things:

  • how many GPUs do you use for training the quantized model?
  • We should support multi-gpu training, how do you launch it?
  • about the convergence, @raghuramank100 might be better suited to explain

@robotcator
Copy link
Author

robotcator commented Jan 15, 2020

@fmassa Thank you for your response.
1: I use 8 GPUs on a single node.
2: From the code logic, the multi-GPU training for the quantized model is ok, but maybe we need to carefully handle the EMA observer. What do you think of it?
3: I use torch.distributed.launch method to launch the training script.
4: About the convergence, I am not sure the result trains with multi-GPU or not, so I don't know whether the different training script will result in different convergence.

Here, I have some facts about my experiment.
1: I download the imagenet_1k dataset as an example to reproduce the error.
2: I use the command for normal float training, it seems that the program is fine with my environment.

python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model mobilenet_v2 --epochs 300 --lr 0.045 --wd 0.00004 --lr-step-size 1 --lr-gamma 0.98 --data-path=~/test/imagenet_1k
3: Then I use the command for the quantized version training, it have some problem with the program. The log description is here and I update the reproduciable steps.
python -m torch.distributed.launch --nproc_per_node=8 --use_env train_quantization.py --data-path=~/test/imagenet_1k
4: Also I have some findings which clarify in this issue.

@LeeHXAlice
Copy link

@robotcator I meet the same issue. How did you fix this problem? Is there some advices? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: models.quantization Issues related to the quantizable/quantized models
Projects
None yet
Development

No branches or pull requests

5 participants