-
Notifications
You must be signed in to change notification settings - Fork 164
Adapt changes inc release 1.13 #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…-1.13 Fixed distillation bug and some UT error
|
The documentation is not available anymore as the PR was closed or merged. |
|
|
||
| # Verification final sparsity is equal to the targeted sparsity | ||
| self.assertGreaterEqual(round(sparsity), target_sparsity * 100) | ||
| self.assertGreaterEqual(round(sparsity), 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we replace with target_sparsity * 100 to 0.5 @PenghuiCheng ? Is it because sparsity is never guaranteed to be the exactly target_sparsity ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because, the target sparsity in yaml is for every operator sparsity, not for whole model sparsity, so we can the sparsity of op is all greater than 0.02 in log, but there are many weights, like embedding op, didn't be pruned. And the function of optimizer.get_sparsity() is get the model sparsity. the model sparsity is 0.84, so I set it to 0.5.
the log is as below:
2022-08-05 17:30:41 [INFO] Name Shape NNZ (dense) NNZ (sparse) Sparsity(%) Std Mean Abs-M
ean
0 distilbert.embeddings.word_embeddings.module.w... [30522, 768] 23440896 0 0.00 0.05 -3.83e-02 0.05
1 distilbert.embeddings.position_embeddings.modu... [512, 768] 393216 0 0.00 _0.02_ -4.01e-05 0.01
2 distilbert.transformer.layer.0.attention.q_lin... [768, 768] 589824 0 0.00 0.04 5.97e-05 0.03
3 distilbert.transformer.layer.0.attention.k_lin... [768, 768] 589824 0 0.00 0.04 9.21e-06 0.03
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification
| if teacher_logits is None: | ||
| teacher_outputs = self.agent.criterion.teacher_model_forward(inputs) | ||
| teacher_logits = self._get_logits(teacher_outputs) | ||
| elif hasattr(self.agent, "on_post_forward"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PenghuiCheng Currently in main we use the teacher_model_forward method to compute the teacher outputs, but I agree that it makes totally sense to use on_post_forward to stay compatible for neural_compressor v1.12 or under
|
|
||
| teacher_logits = self._get_logits(teacher_outputs) | ||
| teacher_logits = inputs.pop("teacher_logits", None) | ||
| if hasattr(self.agent, "on_after_compute_loss"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it equivalent to hasattr(self.agent.criterion, "teacher_model_forward") ? cc @PenghuiCheng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on_after_compute_loss is not equivalent to teacher_model_forward, in future , we will use on_after_compute_loss callback to compute distillation loss. the usage is like below:
student_outputs = student_model(input)
student_loss = user_criterion(student_outputs, labels)
total_loss = agent.on_after_compute_loss(input, student_outputs, student_loss, tearch_outputs)
here, teacher_outputs is optional.
The callback of on_after_compute_loss is a new API in 1.13. The purpose of this is to reuse the user's criterion.
But in on_after_compute_loss, it didn't handle a tuple of student_outputs and teacher_outputs. so we didn't use the on_after_compute_loss directly, in the future I will commit another commit to use on_after_compute_loss directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was more about does hasattr(self.agent, "on_after_compute_loss") is equivalent to hasattr(self.agent.criterion, "teacher_model_forward") as self.agent.criterion.teacher_model_forward is called after this condition. Also thanks for your explanation, does it mean that the teacher_model_forward method will be deprecated ? I find this method very useful as it gives us more flexibility : and compute the loss ourselves with the trainer compute_distillation_loss method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The teacher_model_forward will exist in the old version and the new version in neural_compressor, we can use it anyway. Yes, if you want to use the trainer compute_distillation_loss method, you can use hasattr(self.agent.criterion, "teacher_model_forward") condition, but it is only in the new version that will return outputs. So you need to judge that is a new version neural_compressor first.
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
) * Initial commit to enable OVTrainer with joint pruning, quantization and distillation via NNCF * Review OpenVINO Q&A readme and configs * Update README.md * Add post init value checker to OVTrainingArguments * Initial enabling of audio classification/wav2vec2 [tests not included] (#2) * use nncf official branch for install since JPQD is merged * copy ac scripts from transformer repo * init commit for wav2vec2 * add onnx_config argument in OVTrainer for onnx export with unsupported model * enable customized teancher kd * add readme * delete debugging lines * Update openvino-dev and nncf version in setup.py * refactor _enable_standard_onnx_export_option to _set_standard_onnx_export_option * add tests for (movement/quantization) with distillation (#3) * test part 1 * clean "compute_distillation_loss' in OVTrainer * add test of OVTrainer for int8+kd / movement / movement+int8/ movement+int8+kd * add expectedFailuremark to test of OVModelForAudioClassification * revert unncessary codes about "OVModelForAudioClassification" * change to a shorter train for w2v2 in readme * revert compute_metrics change since it is not unnecessary * fix task_loss non-scalar bug for kd logging * make regex clearer in QA bert config * Refactor compression-related logging * Refactor OpenVINO IR generation and patch tests * Miscellaneous refactoring * MO IR pruning depends on scheduler stage * Readme tweaks for all example tasks * Minor tweak on tests * Align setup.py for openvino-dev and nncf versions needed for JPQD * Fix lint with Black * Refactor OpenVINO IR generation using python api * Fix via isort * Handle IR generation error to avoid run termination * Update QA readme * Enable distillation on openvino's image classification example * Minor refactoring in openvino's audio classification example * Move openvino-dev dependency to be extra of NNCF * Configure IR model to accept dynamic-shaped input * Revert _enable_standard_onnx_export_option method in OVConfig * Update wav2vec2 configs for audio classification * Add BERT-base/glue-sst2 example with QAT / JPQD (#4) * copy text-classification example from transformers * init draft for sst example * update sst2 accuracy & training time * Revise wav2vec2 config and audio classification readme * Patch _enable_standard_onnx_export_option to only add the key pair to quantization config * Set logging level to INFO in openvino/trainer.py * Review readme of text and image classification * Revert IR generation with static input shape for joint compression * Add distillation and advance optimization section in optimization_ov.mdx * Patch tests * Revise formatting of optimization_ov.mdx * Limit #checkpoint saved for JPQD samples * Handle NNCF output to text log and only print errors to stdout * Replace hardcoded model.onnx filename with constant variable * Fix movement sparsity config in optimization_ov.mdx * Change _set_feature to _set_task to align with OVQuantizer * Revert onnx_config exposure in OVTrainer, expand test coverages for joint compression variations, misc. patches * use builtin onnx configs for wav2vec onnx export * move teacher model argument from OVTrainingArgs to model args * fix duplicate call of `epoch_step` * temporal workaround about compression metrics * test for all training cases * temporal workaround for eval only * cover train/eval tests * style fix * Move old ovtrainer tests to a new `test_training.py` file; bugfix in training loss check (#6) * removing old tests in test_quantization since they are now in `test_training` * bugfix in checking compression metrics during training * keep bert examples only and misc. fixes (#7) * temporarily keep bert examples only; remove w2v2 and swin * move nncf_compression_config out of OVTrainingArguments * type hint change for nncf_compression_config * documnet rename feature to task * revert existing QAT image classification example * delete useless codes in test quantization * revert existing test_ quantization * misc change in compute_metric * revert unnecessary changes * temporal workaround for logging distill & compression loss (not using dist. reduce) * revert set_task method * bugfix in compression metric in qa task * bugfix in importing tpu * simplify pruning ir codes * clean unncessary distillation weight attribute in trainer * Change nncf requirement to official 2.4 * Log nncf compression statistics at the beginning of each training epoch * Revise optimization_ov.mdx documentation * Consolidate during training optimization to QAT and JPQD * Add known limitation regarding OpenVINO IR with static input shape * fix data parallel crashes and add tests for DP/DDP (#8) * fix "not same device" error in data parallel * wrap teacher model with data parallel * add sst2 tests for dp/ddp with fixes * Add remark in optimization_ov.mdx on supported model architecture for structured pruning * Refactor JPQD IR generation where final IR is dynamic in input shape * Revise optimization_ov.mdx to remove static IR limitations * revert snippet for inference with Transformers pipeline * Remove commented codes in openvino/trainer.py * Add tests about new OV IR export - check dynamic graph and output equivalence to torch model (#9) * draft for new export with some todos * draft for tests * delete onnx export debugging when errors on saving * add back the debug info when ir export fails * bugfix in random setting zeros in movement masks * Add tests on OV IR reshape-ability * Remove unused imports in openvino/trainer.py * Refine inference pipeline with OVModel in optimization_ov.mdx * Revise openvino extras in setup.py --------- Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com> Co-authored-by: Yujie Pan <yujie.pan@intel.com>
No description provided.