-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Closed
Description
Desgin
- Add the Design doc
PRs:
- design doc: Write fixed-point quantization design. #10552
C++
- fake_quantize_op
- calculate sliding maximum value for the dequantization range.
- quantize implementation
- fake_dequantize_op
PRs:
- fake_dequantize_op: Develop a fake dequantized op for fixed-point quantization training framework. #10965
- fake_quantize_op:
Python transpiler:
Requirement:
- develop quantization transpiler to rewrite ProgramDesc to insert the fake_quantize_op and fake_dequantize_op.
- Only insert the fake_quantize_op and fake_dequantize_op in the forward pass.
- Do not change the inputs and outputs of the backward operator.
- consider the batch-norm folding and quantization.
PRs:
- implement training transpiler prototype Quantize transpiler for fixed-point Quantization training framework. #10693
- Enhance the delay quantization for training.
- implement inference transpiler
Model verification.
- Need to determine the baseline.
Metadata
Metadata
Assignees
Labels
No labels