Enable 8-bit integer computations in Attention layer of Marian framework

### Feature description
Currently, computations in Attention layer are 32-bit (floating point) while rest of the layers can do integer computations (8-bit and 16-bit). It would be great if the computations in Attention layer can also happen in 8-bit.

We already have intgemm to do 8-bit integer gemm operations in other layers and the same can be used for Attention layer as well.


Some advantages of doing it:
1. Faster inference
2. Removal of an sgemm library dependency for consumers who only want to do 8-bit int gemm

cc @andrenatal @kpu @XapaJIaMnu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable 8-bit integer computations in Attention layer of Marian framework #50

Feature description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable 8-bit integer computations in Attention layer of Marian framework #50

Description

Feature description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions