- Minimal Introduction to Large Language Models : Understand the High Level Concepts of Transformers Visually.
- Understanding LLM Inference : Understand operations behind Kv-Cache , Flops Counting!
- Quantization Principal : Understand types of quantization and how they can reduce memory overhead required for inferencing and training!
- Calculate GPU Memory Required for Serving LLMS : Reasons that make up memory for a LLM model!
- LLM Underhood : Under-the-hood mechanics of Transformers technique.
- Ai Compute Evolution : How compute necessaties evolved over decades.
- Speculative RAG : Use Generalist RAG to rank multiple Answer Drafts from Specialist RAG!
- Corrective RAG : Lookahead , decide whether correct or not , then rewrite the query if not correct!
- Hyde Query Rewritting : Generate a hypothetical Document from the Query and search against that in the vector space!
- Late Chunking Blog: Late chunking the sweet spot between naive chunking retrieval and late interaction based retrieval!
- Contextual Rag : While Indexing Ask LLM to contextualize the chunk and then index it!
- Retrieval via Late Interaction Models : Colipali and How it used Bag of Words for Document Similarity
- Query Routing : A Set-Fit Based approach for query routing.
- Converting to GGUF Tools : Converting LLM to GGUF Format.
- HNSW Algorithm : It shows the intution behind how the HNSW algorithm works.
- Incident Database: Incidents and Reports of AI Harms.
- Building Effective Agentic Flows: Building effective agents
- LLM Systems: ML & LLM Systems
- Building Agentic Sysmtes: Chip's guide to building Agentic Systems
- Agent Frameworks Comparision: Comparision against different Agents
- Benchmarks: Evaluation Datasets