Split quantize_pt2 to allow calling the same APIs in testing and regular flows #4505
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Splitting
quantize_pt2
into two steps:convert_pt2
andfuse_pt2
. Convert will return the converted model afterconvert_pt2e
, which allows getting reference outputs for testing. Fuse will return the final fused graph. Those calls should be always be using the same quantizer. Note that we will probably split the convert step again to allow calibration in a follow up diff.quantize_pt2
is still the one-liner API, for anything that doesn't require converted reference outputs (so mostly for e2e testing).Main benefit is that we can use the same API everywhere now, and things like decomposing SDPA and any other ATen IR passes that need to run before quantization can be done in one location (in
convert_pt2
).Reviewed By: dulinriley
Differential Revision: D60544102