Some similar projects worth checking out:
- https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap
- https://github.com/terryum/awesome-deep-learning-papers
- https://en.wikipedia.org/wiki/List_of_important_publications_in_computer_science#Machine_learning
- https://en.wikipedia.org/wiki/Computational_learning_theory#References
Valuable textbooks for reference mining:
- https://www.deeplearningbook.org/contents/bib.html
- http://d2l.ai/chapter_references/zreferences.html
- https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
- https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12_toc.pdf
- https://probml.github.io/pml-book/book1.html
Lots of interesting references here on error surface geometry, not sure if the paper itself is appropriate for this collection: https://papers.nips.cc/paper/2019/file/a4ee59dd868ba016ed2de90d330acb6a-Paper.pdf
Test-of-time awards (and best paper awards, but be careful with these):
- ICML Test of Time Award - frustratingly, no comprehensive archive that I can find. Can search respective years for press releases and work backwards.
- NeurIPS test of time award - same thing
- ICLR Test of time award
- SIGIR test of time
-
AdamW - L2 regularization and weight decay are equivalent for SGD but not for adaptive methods
- 2017 - DECOUPLED WEIGHT DECAY REGULARIZATION - Ilya Loshchilov, Frank Hutter
-
Scree plot
- 1966 - "The Scree Test For The Number Of Factors" - Raymond B. Cattell