Skip to content

Add weight quantization #321

Closed
Closed

Description

Weight quantization enables low-power edge devices to perform machine learning by trading a couple of percentage points of prediction accuracy for a very dramatic reduction in computation times.

Mobile devices will motivate dramatic optimizations and code changes to enable cross platform SIMD across very different devices. An alternative is to create an implementation that is fast(simple) enough and correct enough that most users never need to care about numerical performance.

The approach was initially made popular by XNOR-NET for real time edge vision classification.
https://arxiv.org/pdf/1603.05279.pdf
But this is a generalized approach and there have been deeper analyses of the best way to negotiate the bitwidth-vs-model-accuracy scale. The training side can compress until the accuracy drops too low as part of the automatic hyperparameter tuning. This means a slight increase in training time in order to achieve low-power, high-speed streaming inference on edge processors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions