Name		Name	Last commit message	Last commit date
parent directory ..
Artificial-Intelligence-in-Creative-Industries-Advances-Prior-to-2025_2501.02725v1.md		Artificial-Intelligence-in-Creative-Industries-Advances-Prior-to-2025_2501.02725v1.md
Hermes-Lecture-#3-Why-Do-Cognitive-Scientists-Hate-LLMs.md		Hermes-Lecture-#3-Why-Do-Cognitive-Scientists-Hate-LLMs.md
README.md		README.md
SLM_A-Hierarchical-Bayesian-Language-Model-based-on-Pitman-Yor-Processes.md		SLM_A-Hierarchical-Bayesian-Language-Model-based-on-Pitman-Yor-Processes.md
SLM_Hidden-Markov-Models-for-Speech-Recognition.md		SLM_Hidden-Markov-Models-for-Speech-Recognition.md
SLM_IMPROVED-BACKING-OFF-FOR-M-GRAM-LANGUAGE-MODELING.md		SLM_IMPROVED-BACKING-OFF-FOR-M-GRAM-LANGUAGE-MODELING.md
SLM_Latent-Dirichlet-Allocation.md		SLM_Latent-Dirichlet-Allocation.md
SLM_Mathematics-of-Statistical-Machine-Translation.md		SLM_Mathematics-of-Statistical-Machine-Translation.md
SLM_Maximum-Entropy-Approach-NLP.md		SLM_Maximum-Entropy-Approach-NLP.md
SLM_empirical-study-of-smoothing-techniques-for-language-modeling.md		SLM_empirical-study-of-smoothing-techniques-for-language-modeling.md

README.md

History leading to SOTA LLM

Timeline from 1990-2023 ...with key breakthroughs, papers, and popular applications

SLMs analyzed natural language using probabilistic methods, calculating sentence probabilities as products of conditional probabilities.

1990: Hidden Markov Models for speech recognition (Rabiner) [Voice Command Systems]
1993: IBM Model 1 for statistical machine translation (Brown et al.) [Early Online Translation]
1995: Improved backing-off for M-gram language modeling (Kneser & Ney) [Spell Checkers]
1996: Maximum Entropy Models (Berger et al.) [Text Classification]
1999: An empirical study of smoothing techniques for language modeling (Chen & Goodman) [Improved Language Models]
2002: Latent Dirichlet Allocation (LDA) (Blei et al.) [Document Clustering]
2006: Hierarchical Pitman-Yor language model (Teh) [Text Generation]

NLMs leveraged neural networks to predict word sequences, introducing word vectors and overcoming SLM limitations.

PLMs introduced the "pre-training and fine-tuning" paradigm, training on large volumes of text before task-specific fine-tuning.

2017: Attention is All You Need [Language Translation]
2018: ULMFiT (Universal Language Model Fine-tuning) [Text classification]
2018: ELMo (Embeddings from Language Models) [Named Entity Recognition]
2018: BERT (Bidirectional Encoder Representations from Transformers) [Question answering]
2019: GPT-2 [Text completion and generation]
2019: XLNet [Sentiment analysis]
2019: RoBERTa: A Robustly Optimized BERT Pretraining Approach [Natural language inference]
2020: ELECTRA [Token classification tasks]

LLMs are trained on massive text corpora with billions of parameters, approximating human-level performance in various tasks.

2020: GPT-3 [OpenAI, 175B] [Few-shot learning across various NLP tasks]
2020: GShard [Google, 600B] [Multilingual translation]
2021: Switch Transformer [Google, 1.6T] [Efficient language modeling]
2021: Megatron-Turing NLG [Microsoft & NVIDIA, 530B] [Natural language generation]
2022: PaLM [Google, 540B] [Reasoning and problem-solving]
2022: BLOOM [BigScience, 176B] [Open-source multilingual language model]
2023: GPT-4 [OpenAI, undisclosed] [Advanced language understanding and generation]