Skip to content

Conversation

@MasterJH5574
Copy link
Member

This PR uses FuseTIRByPattern to match the decode + NT-GeMV + optionally a trailing element-wise TIR function.

E2E verified locally.

The next step is to turn off NT-matmul and update the quantization encoding/decoding accordingly so that the quantization encoding func transposes the weights from T to N, and also update this pattern match function accordingly.

This PR uses FuseTIRByPattern to match the decode + NT-GeMV +
optionally a trailing element-wise TIR function.

E2E verified locally.

The next step is to turn off NT-matmul and update the quantization
encoding/decoding accordingly so that the quantization encoding func
transposes the weights from T to N, and also update this pattern match
function accordingly.
@MasterJH5574 MasterJH5574 merged commit dbaeccf into quantize Apr 14, 2023
@MasterJH5574 MasterJH5574 deleted the 04-13-web-decode-nt-matmul branch April 14, 2023 02:01
MasterJH5574 added a commit that referenced this pull request Apr 14, 2023
This PR uses FuseTIRByPattern to match the decode + NT-GeMV + optionally
a trailing element-wise TIR function.

E2E verified locally.

The next step is to turn off NT-matmul and update the quantization
encoding/decoding accordingly so that the quantization encoding func
transposes the weights from T to N, and also update this pattern match
function accordingly.
marschr pushed a commit to xkpesc/web-llm that referenced this pull request Feb 3, 2025
@Iternal-JBH4 Iternal-JBH4 mentioned this pull request Jul 29, 2025
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants