|
2 | 2 |
|
3 | 3 | This repository contains a PyTorch implementation of the paper:
|
4 | 4 |
|
5 |
| -[SWALP : Stochastic Weight Averaging for Low-Precision Training (SWALP)](https://arxiv.org/abs/1904.11943). |
| 5 | +[SWALP : Stochastic Weight Averaging for Low-Precision Training (SWALP)](https://arxiv.org/abs/1904.11943). |
6 | 6 |
|
7 |
| -[Guandao Yang](http://www.guandaoyang.com), |
8 |
| -[Tianyi Zhang](https://scholar.google.com/citations?hl=en&view_op=list_works&gmla=AJsN-F5oL2dqrt5Dli21O3seTVse8viKdodY4EQrZp8EV0BUpG5s1brVEPMWVunGQizs0Lltdmn5cPooQHA77vDxymqIITnUUL-GRlYglybFcTnDURbvEss&user=OI0HSa0AAAAJ#), |
9 |
| -Polina Kirichenko, Junwen Bai, |
10 |
| -[Andrew Gordon Wilson](https://people.orie.cornell.edu/andrew/), |
| 7 | +[Guandao Yang](http://www.guandaoyang.com), |
| 8 | +[Tianyi Zhang](https://scholar.google.com/citations?hl=en&view_op=list_works&gmla=AJsN-F5oL2dqrt5Dli21O3seTVse8viKdodY4EQrZp8EV0BUpG5s1brVEPMWVunGQizs0Lltdmn5cPooQHA77vDxymqIITnUUL-GRlYglybFcTnDURbvEss&user=OI0HSa0AAAAJ#), |
| 9 | +Polina Kirichenko, Junwen Bai, |
| 10 | +[Andrew Gordon Wilson](https://people.orie.cornell.edu/andrew/), |
11 | 11 | [Christopher De Sa](http://www.cs.cornell.edu/~cdesa/)
|
12 | 12 |
|
13 | 13 | 
|
14 | 14 |
|
15 | 15 | ## Introduction
|
16 | 16 |
|
17 |
| -Low precision operations can provide scalability, memory savings, portability, and energy efficiency. |
18 |
| -This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. |
| 17 | +Low precision operations can provide scalability, memory savings, portability, and energy efficiency. |
| 18 | +This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. |
19 | 19 | SWALP is easy to implement and can match the performance of *full-precision* SGD even with all numbers quantized down to 8 bits, including the gradient accumulators.
|
20 |
| -Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings. |
| 20 | +Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings. |
21 | 21 |
|
22 |
| -This repo contains the codes to replicate our experiment for CIFAR datasets with VGG16 and PreResNet164. |
| 22 | +This repo contains the codes to replicate our experiment for CIFAR datasets with VGG16 and PreResNet164. |
23 | 23 |
|
24 | 24 | ## Citing this Work
|
25 | 25 | Please cite our work if you find this approach useful in your research:
|
@@ -67,6 +67,10 @@ The full-precision results (SGD-FP and SWA-FP) are produced by running the SWA r
|
67 | 67 | | CIFAR100 | VGG16 | 27.23±0.17 | 25.93±0.21 | 29.59±0.32 | 26.65±0.29 |
|
68 | 68 | | | PreResNet164 | 22.20±0.57 | 19.95±0.19 | | |
|
69 | 69 |
|
| 70 | +## Other implementations |
| 71 | + |
| 72 | +Tianyi Zhang provides an implementation using a low-precision training framework [QPyTorch](https://github.com/Tiiiger/QPyTorch) in this [link](https://github.com/Tiiiger/QPyTorch/tree/master/examples/SWALP). |
| 73 | + |
70 | 74 | ## References
|
71 | 75 | We use the [SWA repo](https://github.com/timgaripov/swa/) as starter template.
|
72 | 76 | Network architecture implementations are adapted from:
|
|
0 commit comments