Skip to content

Commit 99c1f4d

Browse files
committed
Update: include other implementation
1 parent 6c0fb1b commit 99c1f4d

File tree

2 files changed

+13
-9
lines changed

2 files changed

+13
-9
lines changed

README.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,24 @@
22

33
This repository contains a PyTorch implementation of the paper:
44

5-
[SWALP : Stochastic Weight Averaging for Low-Precision Training (SWALP)](https://arxiv.org/abs/1904.11943).
5+
[SWALP : Stochastic Weight Averaging for Low-Precision Training (SWALP)](https://arxiv.org/abs/1904.11943).
66

7-
[Guandao Yang](http://www.guandaoyang.com),
8-
[Tianyi Zhang](https://scholar.google.com/citations?hl=en&view_op=list_works&gmla=AJsN-F5oL2dqrt5Dli21O3seTVse8viKdodY4EQrZp8EV0BUpG5s1brVEPMWVunGQizs0Lltdmn5cPooQHA77vDxymqIITnUUL-GRlYglybFcTnDURbvEss&user=OI0HSa0AAAAJ#),
9-
Polina Kirichenko, Junwen Bai,
10-
[Andrew Gordon Wilson](https://people.orie.cornell.edu/andrew/),
7+
[Guandao Yang](http://www.guandaoyang.com),
8+
[Tianyi Zhang](https://scholar.google.com/citations?hl=en&view_op=list_works&gmla=AJsN-F5oL2dqrt5Dli21O3seTVse8viKdodY4EQrZp8EV0BUpG5s1brVEPMWVunGQizs0Lltdmn5cPooQHA77vDxymqIITnUUL-GRlYglybFcTnDURbvEss&user=OI0HSa0AAAAJ#),
9+
Polina Kirichenko, Junwen Bai,
10+
[Andrew Gordon Wilson](https://people.orie.cornell.edu/andrew/),
1111
[Christopher De Sa](http://www.cs.cornell.edu/~cdesa/)
1212

1313
![swalp-image](assets/swalp.jpg)
1414

1515
## Introduction
1616

17-
Low precision operations can provide scalability, memory savings, portability, and energy efficiency.
18-
This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule.
17+
Low precision operations can provide scalability, memory savings, portability, and energy efficiency.
18+
This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule.
1919
SWALP is easy to implement and can match the performance of *full-precision* SGD even with all numbers quantized down to 8 bits, including the gradient accumulators.
20-
Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.
20+
Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.
2121

22-
This repo contains the codes to replicate our experiment for CIFAR datasets with VGG16 and PreResNet164.
22+
This repo contains the codes to replicate our experiment for CIFAR datasets with VGG16 and PreResNet164.
2323

2424
## Citing this Work
2525
Please cite our work if you find this approach useful in your research:
@@ -67,6 +67,10 @@ The full-precision results (SGD-FP and SWA-FP) are produced by running the SWA r
6767
| CIFAR100 | VGG16 | 27.23±0.17 | 25.93±0.21 | 29.59±0.32 | 26.65±0.29 |
6868
| | PreResNet164 | 22.20±0.57 | 19.95±0.19 | | |
6969

70+
## Other implementations
71+
72+
Tianyi Zhang provides an implementation using a low-precision training framework [QPyTorch](https://github.com/Tiiiger/QPyTorch) in this [link](https://github.com/Tiiiger/QPyTorch/tree/master/examples/SWALP).
73+
7074
## References
7175
We use the [SWA repo](https://github.com/timgaripov/swa/) as starter template.
7276
Network architecture implementations are adapted from:

assets/swalp.jpg

-809 Bytes
Loading

0 commit comments

Comments
 (0)