GitHub - Godofnothing/PowerLawOptimization: Convergence analysis of problems with power-law spectra

Description

This repository accompanies the ICLR 2023 poster

A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta.
M. Velikanov, D. Kuznedelev, D. Yarotsky

[OpenReview] [arxiv]

Structure of the repository

The repository contains the implementation of the 4 training regimes used in the paper:

Training of real neural network (Fully-connected NN with 1 hidden layer)
Linearized regime with stochastic sampling of batches
Linearized regime with averaging over all possible batch samplings
Spectral diagonal (SD) regime

Training scripts

train_nn.py - run SGD dynamics of single-layer neural network 
train_linearized.py - run SGD dynamics in linearized regime with stochastic sampling ofr batches
train_full.py - run SGD dynamics averaged over possible batch samplings
train_spectral_diagonal.py - run SGD dynamics in spectral diagonal approximation with arbitrary \tau
train_4_regimes.py - run SGD dynamics simultaneously in 4 regimes
train_4_regimes_serial.py - run SGD dynamics simultaneously in 4 regimes with a series of runs for training 1-layer neural network, in the linearized regime with stochastic sampling, and with the list of \tau parameters.

Notebooks

Notebooks are located in ./notebooks directory

BudgetIndependence.ipynb - for plotting the results of experiments on budget indepence
RegimeComparison.ipynb - comparsion with different regimes mentioned in the paper
RegimeComparisonSerial.ipynb - comparsion with running of series of runs for sampling
SGD_generating_funcs.ipynb - notebooks with the symbolic computations used in the work (we suggest to run it on colab)

Environment

All experiments were run in the conda environment with the following versions of packages installed

pytorch == 1.11.0
functorch == 0.1.1
seaborn == 0.11.2
matplotlib == 3.5.1
sympy == 1.7.1

Version of pytorch>=1.11.0 + functorch is needed for efficient computation of NTK.

Datasets used

In this work we have used following datasets

MNIST dataset from torchvision
Bike sharing dataset from UCI Machine Learning Repository
SGEMM GPU kernel performance dataset from UCI Machine Learning Repository
Synthetic datasets with specific exponents $\nu$ and $\varkappa$ for power-law decay

All data for the experiments is expected to be stored at ./data directory.

To reproduce the experiments in the paper download the datasets and extract into the ./data directory.

mkdir -p data/Bike-Sharing data/sgemm_product
cd data/Bike-Sharing
wget https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip
unzip Bike-Sharing-Dataset.zip
cd ../sgemm_product
wget https://archive.ics.uci.edu/ml/machine-learning-databases/00440/sgemm_product_dataset.zip
unzip sgemm_product_dataset.zip

Usage

To run series of experiments for different configuration of parameters provide a list of parameters to the following flags:

--lr (learning rate)
--momentum (momentum) 
--batch_size (batch size)

By default if list of parameters is provided for each of these arguments experiments will be run with all possible combintations of these parameters. If one would like to run only a single run for a pair of these parameters provide explicitly flag --aggr_type zip. Note, that in this case lengths of arguments have to be equal to each other or to 1.

Bibtex

If this project is useful for you, please consider citing our paper 📣

@inproceedings{
velikanov2023a,
title={A view of mini-batch {SGD} via generating functions: conditions of convergence, phase transitions,  benefit from negative momenta.},
author={Maksim Velikanov and Denis Kuznedelev and Dmitry Yarotsky},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=bzaPGEllsjE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datasets		datasets
images		images
models		models
notebooks		notebooks
optim		optim
schedules		schedules
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run_train_4_regimes.sh		run_train_4_regimes.sh
run_train_full.sh		run_train_full.sh
run_train_linearized.sh		run_train_linearized.sh
run_train_nn.sh		run_train_nn.sh
run_train_sd.sh		run_train_sd.sh
train_4_regimes.py		train_4_regimes.py
train_4_regimes_serial.py		train_4_regimes_serial.py
train_full.py		train_full.py
train_linearized.py		train_linearized.py
train_nn.py		train_nn.py
train_spectral_diagonal.py		train_spectral_diagonal.py
train_spectral_diagonal_from_hparam_dict.py		train_spectral_diagonal_from_hparam_dict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Description

Structure of the repository

Training scripts

Notebooks

Environment

Datasets used

Usage

Bibtex

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Godofnothing/PowerLawOptimization

Folders and files

Latest commit

History

Repository files navigation

Description

Structure of the repository

Training scripts

Notebooks

Environment

Datasets used

Usage

Bibtex

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages