The LLM Deep Dive

The LLM Deep Dive

🐦 Follow me on Twitter • 📧 Contact on Email

In an age of GPT, I'm going to handwrite the best links I've used to learn LLMs.

Welcome.

PS: This is for people trying to go deeper. If you want something kind of basic, look elsewhere.

◻️How to use this guide?

Start by going through the Table of contents. See what you've already read and what you haven't. Then, start with Easy links in each section. Each area has multiple types of subtopics each of which will go more in depth. In the event there are no articles, feel free to email for additions or raise a PR.

◻️Table of contents

🟩 Model Architecture
🟩 Agentic LLMs
🟩 Methodology
- ◻️Distillation
🟩 Datasets
🟩 Pipeline
🟩 FineTuning
🟩 Quantization
- ◻️Post Training Quantization
- ◻️Quantization Aware Training → 1BIT LLM
🟩 RL in LLM
🟩 Coding
- ◻️Torch Fundamentals
🟩 Deployment
🟩 Engineering
🟩 Benchmarks
🟩 Modifications
- ◻️Model Merging
  - Linear Mapping
  - SLERP
  - TIES
  - DARE
- ◻️MoE
🟩 Misc Algorithms
🟩 Explainability
🟩 MultiModal Transformers
- ◻️Audio
  - Whisper Models
  - Diarization
🟩 Adversarial methods
🟩 Misc
🟩 Add to the guide:

🟩 Model Architecture

This section talks about the key aspects of LLM architecture.

📝 Try to cover basics of Transformers, then understand the GPT architecture before diving deeper into other concepts

◻️Transformer Architecture

Tokenization

Positional Encoding

Rotational Positional Encoding

Rotary Positional Encoding

-Rotary Positional Encoding Explained

◻️GPT Architecture

◻️Attention

◻️Loss

Cross-Entropy Loss

🟩 Agentic LLMs

-Agentic LLMs Deep Dive This section talks about various aspects of the Agentic LLMs

🟩 Methodology

This section tries to cover various methodologies used in LLMs.

◻️Distillation

🟩 Datasets

🟩 Pipeline

◻️Training

◻️Inference

RAG

◻️Prompting

🟩 FineTuning

◻️ORPO

◻️RLHF

Umar Jamil: RLHF Explained

🟩 Quantization

◻️Post Training Quantization

Static/Dynamic Quantization

GPTQ

GGUF

LLM.int8()

◻️Quantization Aware Training → 1BIT LLM

🟩 RL in LLM

🟩 Coding

◻️Torch Fundamentals

🟩 Deployment

🟩 Engineering

◻️Flash Attention 2

◻️KV Cache

◻️Batched Inference

◻️Python Advanced

Decorators

Context Managers

◻️Triton Kernels

◻️CuDA

CUDA / GPU Mode lecture Talk

◻️JAX / XLA JIT compilers

◻️Model Exporting (vLLM, Llama.cpp, QLoRA)

◻️ML Debugging

🟩 Benchmarks

🟩 Modifications

◻️Model Merging

-An Introduction to Model Merging for LLMs

Linear Mapping

SLERP

-Merging tokens to accelerate LLM inference with SLERP

TIES

DARE

◻️MoE

🟩 Misc Algorithms

◻️Chained Matrix Unit

◻️Gradient Checkpointing

◻️Chunked Cross Entropy

◻️BPE

🟩 Explainability

◻️Sparse Autoencoders

Sparse AutoEncoders Explained

◻️Task Vectors

◻️Counterfactuals

🟩 MultiModal Transformers

◻️Audio

Whisper Models

Whisper Model Explained

Diarization

🟩 Adversarial methods

🟩 Misc

Tweet on what to learn in ML (RT by Karpathy)

🟩 Add to the guide:

Add links you find useful through pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
img		img
.gitignore		.gitignore
README.md		README.md

Exorust/LLM-Deep-Dive

Folders and files

Latest commit

History

Repository files navigation

The LLM Deep Dive

◻️How to use this guide?

◻️Table of contents

🟩 Model Architecture

◻️Transformer Architecture

Tokenization

Positional Encoding

Rotational Positional Encoding

Rotary Positional Encoding

◻️GPT Architecture

◻️Attention

◻️Loss

Cross-Entropy Loss

🟩 Agentic LLMs

🟩 Methodology

◻️Distillation

🟩 Datasets

🟩 Pipeline

◻️Training

◻️Inference

RAG

◻️Prompting

🟩 FineTuning

◻️Quantized FineTuning

◻️LoRA

◻️DPO

◻️ORPO

◻️RLHF

🟩 Quantization

◻️Post Training Quantization

Static/Dynamic Quantization

GPTQ

GGUF

LLM.int8()

◻️Quantization Aware Training → 1BIT LLM

🟩 RL in LLM

🟩 Coding

◻️Torch Fundamentals

🟩 Deployment

🟩 Engineering

◻️Flash Attention 2

◻️KV Cache

◻️Batched Inference

◻️Python Advanced

Decorators

Context Managers

◻️Triton Kernels

◻️CuDA

◻️JAX / XLA JIT compilers

◻️Model Exporting (vLLM, Llama.cpp, QLoRA)

◻️ML Debugging

🟩 Benchmarks

🟩 Modifications

◻️Model Merging

Linear Mapping

SLERP

TIES

DARE

◻️MoE

🟩 Misc Algorithms

◻️Chained Matrix Unit

◻️Gradient Checkpointing

◻️Chunked Cross Entropy

◻️BPE

🟩 Explainability

◻️Sparse Autoencoders

◻️Task Vectors

◻️Counterfactuals

🟩 MultiModal Transformers

◻️Audio

Whisper Models

Diarization

🟩 Adversarial methods

🟩 Misc

🟩 Add to the guide:

Packages