Skip to content

Enable 8-bit integer computations in Attention layer of Marian framework #50

Open
@abhi-agg

Description

@abhi-agg

Feature description

Currently, computations in Attention layer are 32-bit (floating point) while rest of the layers can do integer computations (8-bit and 16-bit). It would be great if the computations in Attention layer can also happen in 8-bit.

We already have intgemm to do 8-bit integer gemm operations in other layers and the same can be used for Attention layer as well.

Some advantages of doing it:

  1. Faster inference
  2. Removal of an sgemm library dependency for consumers who only want to do 8-bit int gemm

cc @andrenatal @kpu @XapaJIaMnu

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions