Timeline from 1990-2023 ...with key breakthroughs, papers, and popular applications
SLMs analyzed natural language using probabilistic methods, calculating sentence probabilities as products of conditional probabilities.
- 1990: Hidden Markov Models for speech recognition (Rabiner) [Voice Command Systems]
- 1993: IBM Model 1 for statistical machine translation (Brown et al.) [Early Online Translation]
- 1995: Improved backing-off for M-gram language modeling (Kneser & Ney) [Spell Checkers]
- 1996: Maximum Entropy Models (Berger et al.) [Text Classification]
- 1999: An empirical study of smoothing techniques for language modeling (Chen & Goodman) [Improved Language Models]
- 2002: Latent Dirichlet Allocation (LDA) (Blei et al.) [Document Clustering]
- 2006: Hierarchical Pitman-Yor language model (Teh) [Text Generation]
NLMs leveraged neural networks to predict word sequences, introducing word vectors and overcoming SLM limitations.
- 2012: AlexNet wins ImageNet competition [Image Recognition]
- 2013: Deep Learning using Linear Support Vector Machines (Tang) [Computer Vision]
- 2013: Word2Vec introduces efficient word embeddings [Search Engines]
- 2013: Sequence-to-sequence models emerge [Machine Translation]
- 2014: Attention mechanism introduced [Neural Machine Translation]
- 2015: ResNet surpasses human-level performance on ImageNet [Image Classification]
PLMs introduced the "pre-training and fine-tuning" paradigm, training on large volumes of text before task-specific fine-tuning.
- 2017: Attention is All You Need [Language Translation]
- 2018: ULMFiT (Universal Language Model Fine-tuning) [Text classification]
- 2018: ELMo (Embeddings from Language Models) [Named Entity Recognition]
- 2018: BERT (Bidirectional Encoder Representations from Transformers) [Question answering]
- 2019: GPT-2 [Text completion and generation]
- 2019: XLNet [Sentiment analysis]
- 2019: RoBERTa: A Robustly Optimized BERT Pretraining Approach [Natural language inference]
- 2020: ELECTRA [Token classification tasks]
LLMs are trained on massive text corpora with billions of parameters, approximating human-level performance in various tasks.
- 2020: GPT-3 [OpenAI, 175B] [Few-shot learning across various NLP tasks]
- 2020: GShard [Google, 600B] [Multilingual translation]
- 2021: Switch Transformer [Google, 1.6T] [Efficient language modeling]
- 2021: Megatron-Turing NLG [Microsoft & NVIDIA, 530B] [Natural language generation]
- 2022: PaLM [Google, 540B] [Reasoning and problem-solving]
- 2022: BLOOM [BigScience, 176B] [Open-source multilingual language model]
- 2023: GPT-4 [OpenAI, undisclosed] [Advanced language understanding and generation]