Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Add per-token asym INT8 dynamic quantization support to QKV/MLP node #27001

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

usstq
Copy link
Contributor

@usstq usstq commented Oct 11, 2024

Details:

  • to use AMX-INT8 to boost performance of QKV/MLP layers in LLM, we need dynamic per-token INT8 quantization according to many research papers (SmoothQuant for example) and reference implementations (xFT for example), here we add the support, when following conditions are met:

    • platform support AMX-INT8
    • QKV&MLP weights are symmetrically per-OC quantized as INT8
  • add AMX-FP16 support to QKV/MLP layers on GNR platform.

  • optimize single batch 2nd token special case.

Tickets:

  • ticket-id

@github-actions github-actions bot added category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra labels Oct 11, 2024
@usstq
Copy link
Contributor Author

usstq commented Oct 11, 2024

Hi @dmitry-gorokhov , this is only a draft due to some unsolved design decision, like how to conditionally enable dynamic quantization, and maybe on per-layer basis. please take a look and comment, Thanks!

@dmitry-gorokhov dmitry-gorokhov self-assigned this Oct 11, 2024
@usstq usstq marked this pull request as ready for review October 18, 2024 08:17
@usstq usstq requested review from a team as code owners October 18, 2024 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants