Bengio et al's survey on representation learning

  • Yoshua Bengio, Aaron Courville and Pascal Vincent. "Representation Learning: A Review and New Perspectives." pdf TPAMI 35:8(1798-1828)

Bengio, LeCun Yann, Yoshua Bengio and Geoffrey Hinton's survey on Nature

  • Yann LeCun, Yoshua Bengio and Geoffrey Hinton. "Deep Learning" pdf Nature 521, 436–444
  • [survey, CNN, RNN, ReNN] Yoav Goldberg. "A Primer on Neural Network Models for Natural Language Processing". pdf 2015

Embeddings & Language Models

Skip-gram embeddings

  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. "Efficient Estimation of Word Representations in Vector Space." pdf ICLR, 2013.
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. "Distributed Representations of Words and Phrases and their Compositionality." pdf NIPS, 2013.
  • [king-man+woman=queen] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. "Linguistic Regularities in Continuous Space Word Representations." pdf NAACL, 2013.
  • [technical note] Yoav Goldberg and Omer Levy "word2vec explained: deriving Mikolov et al.'s negative-sampling word-embedding method" pdf Tech-report 2013
  • [buzz-busting] Omer Levy and Yoav Goldberg "Linguistic Regularities in Sparse and Explicit Word Representations" pdf CoNLL-2014 Best Paper Award
  • [lessons learned] Omer Levy, Yoav Goldberg, Ido Dagan "Improving Distributional Similarity with Lessons Learned from Word Embeddings" pdf, TACL 2015
  Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. "Two/Too Simple Adaptations of Word2Vec for Syntax Problems" pdf NAACL 2015 (Short)

Embedding enhancement: Syntax, Retrofitting, etc

  • [dependency embeddings] Omer Levy and Yoav Goldberg "Dependency Based Word Embeddings" pdf ACL 2014 (Short)
  • [dependency embeddings] Mohit Bansal, Kevin Gimpel and Karen Livescu. "Tailoring Continuous Word Representations for Dependency Parsing" pdf ACL 2014 (Short)
  • [retrofitting with lexical knowledge] Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy and Noah A. Smith. "Retrofitting Word Vectors to Semantic Lexicons" pdf, NAACL 2015
  • [contrastive estimation] Mnih and Kavukcuoglu, "Learning Word Embeddings Efficiently with Noise-Contrastive Estimation." pdf NIPS 2013
  • [embedding documents] Quoc V Le, Tomas Mikolov. "Distributed representations of sentences and documents" pdf ICML 2014
  • [synonymy relations] Mo Yu, Mark Dredze. "Improving Lexical Embeddings with Semantic Knowledge" pdf ACL 2014 (Short)
  • [embedding relations] Asli Celikyilmaz, Dilek Hakkani-Tur, Panupong Pasupat, Ruhi Sarikaya. "Enriching Word Embeddings Using Knowledge Graph for Semantic Tagging in Conversational Dialog Systems" pdf AAAI 2015 (Short)
  • [multimodal] Angeliki Lazaridou, Nghia The Pham and Marco Baroni. "Combining Language and Vision with a Multimodal Skip-gram Model" pdf NAACL 2015
  • [syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. "Two/Too Simple Adaptations of Word2Vec for Syntax Problems" pdf NAACL 2015 (Short)
  • [autoencoder, lexeme, lexical resource, synset] Sascha Rothe and Hinrich Schutze, "AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes" pdf ACL 2015 Best Paper
  • [lexical resource, babelnet] Ignacio Iacobacci, Mohammad Taher Pilehvar and Roberto Navigli, "SensEmbed: Learning Sense Embeddings for Word and Relational Similarity" pdf ACL 2015
  • [specific linguistic relation] Zhigang Chen, Wei Lin, Qian Chen, Xiaoping Chen, Si Wei, Hui Jiang and Xiaodan Zhu, "Revisiting Word Embedding for Contrasting Meaning" pdf ACL 2015
  • [syntax] Jianpeng Cheng and Dimitri Kartsaklis. "Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning". pdf EMNLP 2015, Lisbon, Portugal, September 2015.

Embedding enhancement: Word order, Morphological, etc

  • [syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. "Two/Too Simple Adaptations of Word2Vec for Syntax Problems" pdf NAACL 2015 (Short)
  • [word order] Rie Johnson and Tong Zhang. Effective use of word order for text categorization with convolutional neural networks. pdf NAACL 2015
  • [word order] Radu Soricut and Franz Och. "Unsupervised Morphology Induction Using Word Embeddings" pdf NAACL 2015 Best Paper Awards
  • [morphology] Minh-Thang Luong Richard Socher Christopher D. Manning. "Better Word Representations with Recursive Neural Networks for Morphology" pdf CoNLL 2013
  • [morpheme] Siyu Qiu, Qing Cui, Jiang Bian, Bin Gao, Tie-Yan Liu. "Co-learning of Word Representations and Morpheme Representations" pdf COLING 2014
  • [morphological] Ryan Cotterell and Hinrich Schütze. "Morphological Word-Embeddings" pdf NAACL 2015 (Short)
  • [regularization] Dani Yogatama, Manaal Faruqui, Chris Dyer, Noah Smith. "Learning Word Representations with Hierarchical Sparse Coding" pdf ICML 2015
  • [character, word order, based on word2vec] Andrew Trask David Gilmore Matthew Russell, "Modeling Order in Neural Word Embeddings at Scale" pdf ICML 2015

Embeddings as matrix factorization

  • [approximate interpretation] Levy and Goldberg, "Neural Word Embedding as Implicit Matrix Factorization." pdf NIPS 2014
  • Omer Levy, Steffen Remus, Chris Biemann, and Ido Dagan. "Do Supervised Distributional Methods Really Learn Lexical Inference Relations?" pdf NAACL 2015 (Short)
  • Tim Rocktaschel, Sameer Singh and Sebastian Riedel. "Injecting Logical Background Knowledge into Embeddings for Relation Extraction" pdf NAACL 2015
  • [exact interpretation] Yitan Li, Linli Xu, Fei Tian, Liang Jiang, Xiaowei Zhong and Enhong Chen. "Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective" pdf IJCAI 2015
  • [SVD, framework, scaling] Karl Stratos, Michael Collins, and Daniel Hsu. " Model-based Word Embeddings from Decompositions of Count Matrices". pdf ACL 2015.
  • [MF, SVD] Omer Levy, Yoav Goldberg, and Ido Dagan. "Improving Distributional Similarity with Lessons Learned from Word Embeddings". pdf TACL 2015.

Embedding obtained from other methods

  • [noise-contrasive estimation] Andriy Mnih and Koray Kavukcuoglu, "Learning word embeddings efficiently with noise-contrastive estimation" pdf NIPS 2013
  • [logarithm of word-word co-occurrences] Jeffrey Pennington, Richard Socher, and Christopher D. Manning, "GloVe: Global Vectors for Word Representation" pdf EMNLP 2014
  • [explicitly encode co-occurrences] Omer Levy, Goldberg Yoav, and Ramat-Gan Israel, "Linguistic regularities in sparse and explicit word representations." pdf CoNLL 2014.

Why and when embeddings are better

  • [comparison between pretrained embeddings] Yanqing Chen, Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. "The expressive power of word embeddings" pdf ICML 2013
  • [prediction fashioned matters] Felix Hill, KyungHyun Cho, Sebastien Jean, et al., "Not all neural embeddings are born equal" pdf NIPS Workshop 2014
  • [multichannel as multi-embeddings input] Wenpeng Yin, Hinrich Schütze. "MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity" ACL 2015
  • [dimension, corpus, compare] Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao, "How to Generate a Good Word Embedding?" pdf arXiv pre-print

Word Representations via Distribution Embedding

  • Katrin Erk, "Representing Words As Regions in Vector Space". pdf In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, Boulder, Colorado, 2009.
  Karl Stratos, Michael Collins, and Daniel Hsu. " Model-based Word Embeddings from Decompositions of Count Matrices". pdf ACL 2015.
  Omer Levy, Yoav Goldberg, and Ido Dagan. "Improving Distributional Similarity with Lessons Learned from Word Embeddings". pdf TACL 2015.
  • [random walks, generative model] Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski. "Random walks on discourse spaces: a new generative language model with applications to semantic word embeddings" pdf. In CoRR, 2015.
  • [breadth, asymmetric] Luke Vilnis, Andrew McCallum. "Word Representations via Gaussian Embedding". pdf. In ICLR, 2015.
  • [markov, generative, MF] Tatsunori B. Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola. "Word, graph, and manifold embedding from Markov processes". pdf. arXiv preprint 2015.


  • Brown et al., "Class-Based n-Gram Models of Natural Language." [pdf] Computational Linguistics 1992

Example Notes, Mini-Tutorials, Technical Reports

  • Yoav Goldberg. "A note on Latent Semantic Analysis" [pdf] Tech-report
  • Yoav Goldberg and Omer Levy "word2vec explained: deriving Mikolov et al.'s negative-sampling word-embedding method" [pdf] Tech-report 2013
