Skip to content

Latest commit

 

History

History
713 lines (577 loc) · 79.2 KB

File metadata and controls

713 lines (577 loc) · 79.2 KB

Awesome Knowledge Distillation of LLM Papers

Awesome PDF

A Survey on Knowledge Distillation of Large Language Models

Xiaohan Xu1   Ming Li2   Chongyang Tao3   Tao Shen4   Reynold Cheng1   Jinyang Li1   Can Xu5   Dacheng Tao6   Tianyi Zhou2  

1 The University of Hong Kong    2 University of Maryland    3 Microsoft    4 University of Technology Sydney    5 Peking University    6 The University of Sydney



A collection of papers related to knowledge distillation of large language models (LLMs). If you want to use LLMs for benefitting your own smaller models training, or use self-generated knowledge to achieve the self-improvement, just take a look at this collection.

We will update this collection every week. Welcome to star ⭐️ this repo to keep track of the updates.

❗️Legal Consideration: It's crucial to note the legal implications of utilizing LLM outputs, such as those from ChatGPT (Restrictions), Llama (License), etc. We strongly advise users to adhere to the terms of use specified by the model providers, such as the restrictions on developing competitive products, and so on.

💡 News

Contributing to This Collection

Feel free to open an issue/PR or e-mail shawnxxh@gmail.com, minglii@umd.edu, hishentao@gmail.com and chongyangtao@gmail.com if you find any missing taxonomies or papers. We will keep updating this collection and survey.

📝 Introduction

KD of LLMs: This survey delves into knowledge distillation (KD) techniques in Large Language Models (LLMs), highlighting KD's crucial role in transferring advanced capabilities from proprietary LLMs like GPT-4 to open-source counterparts such as LLaMA and Mistral. We also explore how KD enables the compression and self-improvement of open-source LLMs by using them as teachers.

KD and Data Augmentation: Crucially, the survey navigates the intricate interplay between data augmentation (DA) and KD, illustrating how DA emerges as a powerful paradigm within the KD framework to bolster LLMs' performance. By leveraging DA to generate context-rich, skill-specific training data, KD transcends traditional boundaries, enabling open-source models to approximate the contextual adeptness, ethical alignment, and deep semantic insights characteristic of their proprietary counterparts.

Taxonomy: Our analysis is meticulously structured around three foundational pillars: algorithm, skill, and verticalization -- providing a comprehensive examination of KD mechanisms, the enhancement of specific cognitive abilities, and their practical implications across diverse fields.

KD Algorithms: For KD algorithms, we categorize it into two principal steps: "Knowledge Elicitation" focusing on eliciting knowledge from teacher LLMs, and "Distillation Algorithms" centered on injecting this knowledge into student models.


Figure: An illustration of different knowledge elicitation methods from teacher LLMs.

Skill Distillation: We delve into the enhancement of specific cognitive abilities, such as context following, alignment, agent, NLP task specialization, and multi-modality.

Verticalization Distillation: We explore the practical implications of KD across diverse fields, including law, medical & healthcare, finance, science, and miscellaneous domains.

Note that both Skill Distillation and Verticalization Distillation employ Knowledge Elicitation and Distillation Algorithms in KD Algorithms to achieve their KD. Thus, there are overlaps between them. However, this could also provide different perspectives for the papers.

Why KD of LLMs?

In the era of LLMs, KD of LLMs plays the following crucial roles:



Role Description Trend
① Advancing Smaller Models Transferring advanced capabilities from proprietary LLMs to smaller models, such as open source LLMs or other smaller models. Most common
② Compression Compressing open-source LLMs to make them more efficient and practical. More popular with the prosperity of open-source LLMs
③ Self-Improvement Refining open-source LLMs' performance by leveraging their own knowledge, i.e. self-knowledge. New trend to make open-source LLMs more competitive

📒 Table of Contents

KD Algorithms

Knowledge Elicitation

Labeling

Title Venue Date Code Data
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering arXiv 2024-03
Aligning Large and Small Language Models via Chain-of-Thought Reasoning EACL 2024-03 Github
Divide-or-Conquer? Which Part Should You Distill Your LLM? arXiv 2024-02
Miko: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery arXiv 2024-02
KnowTuning: Knowledge-aware Fine-tuning for Large Language Models arXiv 2024-02 Github
TinyLLM: Learning a Small Student from Multiple Large Language Models arXiv 2024-02
Mixed Distillation Helps Smaller Language Model Better Reasoning arXiv 2023-12
Tailoring Self-Rationalizers with Multi-Reward Distillation arXiv 2023-11 Github Data
Orca 2: Teaching Small Language Models How to Reason arXiv 2023-11
Mammoth: Building Math Generalist Models through Hybrid Instruction Tuning arXiv 2023-09 Github Data
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization arXiv 2023-06 Github Data
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step ACL 2023-06
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 arXiv 2023-06
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes ACL 2023-05 Github Data
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing arXiv 2023-05 Github
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data EMNLP 2023-04 Github Data
ChatGPT outperforms crowd workers for text-annotation tasks arXiv 2023-03
Annollm: Making large language models to be better crowdsourced annotators arXiv 2023-03
GPT-4All: Training an Assistant-Style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo - 2023-03 Github
Specializing Smaller Language Models towards Multi-Step Reasoning arXiv 2023-01
Is GPT-3 a Good Data Annotator? ACL 2022-12 Github
Large Language Models Are Reasoning Teachers ACL 2022-12 Github Data
Teaching Small Language Models to Reason ACL 2022-12
Explanations from Large Language Models Make Small Reasoners Better arXiv 2022-10
Want To Reduce Labeling Cost? GPT-3 Can Help Findings of EMNLP 2021-08

Expansion

Title Venue Date Code Data
Instruction Fusion: Advancing Prompt Evolution through Hybridization arXiv 2023-12
An Empirical Study of Instruction-tuning Large Language Models in Chinese EMNLP 2023-10 Github Data
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation EMNLP 2023-10 Github
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct arXiv 2023-08 Github
Code Llama: Open Foundation Models for Code arXiv 2023-08 Github
WizardCoder: Empowering Code Large Language Models with Evol-Instruct ICLR 2023-06 Github
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision NeurIPS 2023-05 Github Data
Targeted Data Generation: Finding and Fixing Model Weaknesses ACL 2023-05 Github
Wizardlm: Empowering large language models to follow complex instructions ICLR 2023-04 Github Data
Data
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions arXiv 2023-04 Github Data
Alpaca: Aligning Language Model with Human Preferences - 2023-03 Github Data
Code Alpaca: An Instruction-following LLaMA model for code generation - 2023-03 Github Data
Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases arXiv 2023-03 Github Data
AugGPT: Leveraging ChatGPT for Text Data Augmentation arXiv 2023-02 Github
Self-instruct: Aligning language model with self generated instructions ACL 2022-12 Github Data
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models NAACL 2021-10 Github Data

Curation

Title Venue Date Code Data
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models arXiv 2024-02
Phi-2: The surprising power of small language models - 2023-12
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation arXiv 2023-12
Magicoder: Source Code Is All You Need arXiv 2023-12 Github Data
Data
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning arXiv 2023-11 Github Data
Data
Textbooks Are All You Need II: Phi-1.5 Technical Report arXiv 2023-09
Neural Machine Translation Data Generation and Augmentation using ChatGPT arXiv 2023-07
Textbooks Are All You Need: A Large-Scale Instructional Text Data Set for Language Models arXiv 2023-06
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations arXiv 2023-05 Github Data
AugTriever: Unsupervised Dense Retrieval by Scalable Data Augmentation arXiv 2022-12 Github
SunGen: Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning ICLR 2022-05 Github
ZeroGen: Efficient Zero-shot Learning via Dataset Generation EMNLP 2022-02 Github
InPars: Data Augmentation for Information Retrieval using Large Language Models arXiv 2022-02 Github Data
Towards Zero-Label Language Learning arXiv 2021-09

Feature

Title Venue Date Code Data
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning EMNLP Findings 2024-02 Github Data
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models arXiv 2024-04
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs arXiv 2024-03
DB-LLM: Accurate Dual-Binarization for Efficient LLMs arXiv 2024-02
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation arXiv 2024-02 Github
DISTILLM: Towards Streamlined Distillation for Large Language Models arXiv 2024-02 Github
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs arXiv 2024-02 Github Data
Revisiting Knowledge Distillation for Autoregressive Language Models arXiv 2024-02
Knowledge Fusion of Large Language Models ICLR 2024-01 Github
Improving In-context Learning via Bidirectional Alignment arXiv 2023-12
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains NeurIPS 2023-10
Baby Llama: Knowledge Distillation from an Ensemble of Teachers Trained on a Small Dataset with No Performance Penalty CoNLL 2023-08 Github Data
f-Divergence Minimization for Sequence-Level Knowledge Distillation ACL 2023-07 Github Data
MiniLLM: Knowledge Distillation of Large Language Models ICLR 2023-06 Github Data
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes ICLR 2023-06
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models arXiv 2023-05 Github Data
Less is more: Task-aware layer-wise distillation for language model compression PMLR 2022-10 Github

Feedback

Title Venue Date Code Data
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning EMNLP Findings 2024-02 Github Data
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering arXiv 2024-03
Evolving Knowledge Distillation with Large Language Models and Active Learning arXiv 2024-03
Direct Language Model Alignment from Online AI Feedback arXiv 2024-02
DISTILLM: Towards Streamlined Distillation for Large Language Models arXiv 2024-02 Github
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint arXiv 2024-01 Github
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment arXiv 2023-11
Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Personalization ICLR 2023-10 Github
Motif: Intrinsic Motivation from Artificial Intelligence Feedback ICLR 2023-10 Github
Ultrafeedback: Boosting language models with high-quality feedback arXiv 2023-10 Github Data
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation EMNLP 2023-10 Github
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment arXiv 2023-10
Rlaif: Scaling Reinforcement Learning from Human Feedback with AI Feedback arXiv 2023-09
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct arXiv 2023-08 Github
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes ICLR 2023-06
MiniLLM: Knowledge Distillation of Large Language Models ICLR 2023-06 Github Data
Language to Rewards for Robotic Skill Synthesis arXiv 2023-06 Github
Lion: Adversarial Distillation of Closed-Source Large Language Model EMNLP 2023-05 Github
SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback Generation arXiv 2023-05
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions arXiv 2023-04 Github Data
Reward Design with Language Models ICLR 2023-03 Github
Consitutional AI: Harmlessness from AI Feedback arXiv 2022-12

Self-Knowledge

Title Venue Date Code Data
V-STaR: Training Verifiers for Self-Taught Reasoners arXiv 2024-02
Self-Rewarding Language Models arXiv 2024-01 Github
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models arXiv 2024-01 Github Data
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation arXiv 2024-01 Github Data
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference arXiv 2024-01
GRATH: Gradual Self-Truthifying for Large Language Models arXiv 2024-01
Beyond human data: Scaling self-training for problem-solving with language models arXiv 2023-12
Self-Knowledge Guided Retrieval Augmentation for Large Language Models EMNLP Findings 2023-10 Github
RAIN: Your Language Models Can Align Themselves without Finetuning arXiv 2023-09 Github
Reinforced Self-Training (ReST) for Language Modeling arXiv 2023-08
Humback: Self-Alignment with Instruction Backtranslation ICLR 2023-08 Github
Self-Alignment of Large Language Models via Reinforcement Learning from Contrast Distillation ICLR 2023-07 Github
Self-Improvement of Large Language Models via Reinforcement Learning from Human Feedback EMNLP 2023-06
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision NeurIPS 2023-05 Github Data
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing arXiv 2023-05 Github
Language Model Self-improvement by Reinforcement Learning Contemplation arXiv 2023-05
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data EMNLP 2023-04 Github Data
Self-instruct: Aligning language model with self generated instructions ACL 2022-12 Github Data
Large Language Models Can Self-Improve EMNLP 2022-10
STaR: Bootstrapping Reasoning With Reasoning NeurIPS 2022-03 Github

Distillation Algorithms

Supervised Fine-Tuning

Due to the large number of works applying supervised fine-tuning, we only list the most representative ones here.

Title Venue Date Code Data
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering arXiv 2024-03
Aligning Large and Small Language Models via Chain-of-Thought Reasoning EACL 2024-03 Github
Divide-or-Conquer? Which Part Should You Distill Your LLM? arXiv 2024-02
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models arXiv 2024-02
Orca 2: Teaching Small Language Models How to Reason arXiv 2023-11
TinyLLM: Learning a Small Student from Multiple Large Language Models arXiv 2024-02
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct arXiv 2023-08 Github
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 arXiv 2023-06
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions arXiv 2023-04 Github Data
Wizardlm: Empowering large language models to follow complex instructions ICLR 2023-04 Github Data
Data
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data EMNLP 2023-04 Github Data
Alpaca: Aligning Language Model with Human Preferences - 2023-03 Github Data
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality* - 2023-03 Github Data
Self-instruct: Aligning language model with self generated instructions ACL 2022-12 Github Data
Large Language Models Can Self-Improve EMNLP 2022-10
STaR: Bootstrapping Reasoning With Reasoning NeurIPS 2022-03 Github

Divergence and Similarity

Title Venue Date Code Data
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning EMNLP Findings 2024-02 Github Data
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models arXiv 2024-04
Weight-Inherited Distillation for Task-Agnostic BERT Compression NAACL 2024-03 Github
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation arXiv 2024-02 Github
DISTILLM: Towards Streamlined Distillation for Large Language Models arXiv 2024-02 Github
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs arXiv 2024-02 Github Data
Revisiting Knowledge Distillation for Autoregressive Language Models arXiv 2024-02
Knowledge Distillation for Closed-Source Language Models arXiv 2024-01
Knowledge Fusion of Large Language Models ICLR 2024-01 Github
Improving In-context Learning via Bidirectional Alignment arXiv 2023-12
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains NeurIPS 2023-10
Baby Llama: Knowledge Distillation from an Ensemble of Teachers Trained on a Small Dataset with No Performance Penalty CoNLL 2023-08 Github Data
f-Divergence Minimization for Sequence-Level Knowledge Distillation ACL 2023-07 Github Data
f-Divergence Minimization for Sequence-Level Knowledge Distillation ACL 2023-07 Github Data
MiniLLM: Knowledge Distillation of Large Language Models ICLR 2023-06 Github Data
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes ICLR 2023-06
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models arXiv 2023-05 Github Data
Less is more: Task-aware layer-wise distillation for language model compression PMLR 2022-10 Github
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter NeurIPS 2019-10

Reinforcement Learning

Title Venue Date Code Data
Direct Language Model Alignment from Online AI Feedback arXiv 2024-02
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint arXiv 2024-01 Github
Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models CoRL 2023-11
Motif: Intrinsic Motivation from Artificial Intelligence Feedback ICLR 2023-10 Github
Ultrafeedback: Boosting language models with high-quality feedback arXiv 2023-10 Github Data
Eureka: Human-Level Reward Design via Coding Large Language Models arXiv 2023-10 Github
Rlaif: Scaling Reinforcement Learning from Human Feedback with AI Feedback arXiv 2023-09
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct arXiv 2023-08 Github
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes ICLR 2023-06
Aligning Large Language Models through Synthetic Feedback EMNLP 2023-05 Github Data
Language Model Self-improvement by Reinforcement Learning Contemplation arXiv 2023-05
Consitutional AI: Harmlessness from AI Feedback arXiv 2022-12

Rank Optimization

Title Venue Date Code Data
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering arXiv 2024-03
KnowTuning: Knowledge-aware Fine-tuning for Large Language Models arXiv 2024-02 Github
Self-Rewarding Language Models arXiv 2024-01 Github
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models arXiv 2024-01 Github Data
Zephyr: Direct Distillation of Language Model Alignment arXiv 2023-10 Github Data
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment arXiv 2023-10

Skill Distillation

Context Following

Instruction Following

Title Venue Date Code Data
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models arXiv 2024-02
Revisiting Knowledge Distillation for Autoregressive Language Models arXiv 2024-02
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning arXiv 2024-02 Github Data
Phi-2: The surprising power of small language models - 2023-12
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning ICLR 2023-12 Github Data
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following arXiv 2023-12 Github Data
Instruction Fusion: Advancing Prompt Evolution through Hybridization arXiv 2023-12
Orca 2: Teaching Small Language Models How to Reason arXiv 2023-11
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning NIPS Workshop 2023-10 Github Data
Textbooks Are All You Need II: Phi-1.5 Technical Report arXiv 2023-09
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 arXiv 2023-06
Textbooks Are All You Need: A Large-Scale Instructional Text Data Set for Language Models arXiv 2023-06
SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback Generation arXiv 2023-05
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts arXiv 2023-05 Github Data
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions arXiv 2023-04 Github Data
Wizardlm: Empowering large language models to follow complex instructions ICLR 2023-04 Github Data
Data
Koala: A Dialogue Model for Academic Research - 2023-04 Github Data
Alpaca: Aligning Language Model with Human Preferences - 2023-03 Github Data
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality* - 2023-03 Github Data
Self-instruct: Aligning language model with self generated instructions ACL 2022-12 Github Data

Multi-turn Dialogue

Title Venue Date Code Data
Zephyr: Direct Distillation of LM Alignment arXiv 2023-10 Github Data
OPENCHAT: ADVANCING OPEN-SOURCE LANGUAGE MODELS WITH MIXED-QUALITY DATA ICLR 2023-09 Github Data
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations arXiv 2023-05 Github Data
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data EMNLP 2023-04 Github Data
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality* - 2023-03 Github Data

RAG Capability

Title Venue Date Code Data
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection NIPS 2023-10 Github Data
SAIL: Search-Augmented Instruction Learning arXiv 2023-05 Github Data
Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks NIPS 2023-05 Github Data

Alignment

Thinking Pattern

Title Venue Date Code Data
Aligning Large and Small Language Models via Chain-of-Thought Reasoning EACL 2024-03 Github
Divide-or-Conquer? Which Part Should You Distill Your LLM? arXiv 2024-02
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning arXiv 2024-02 Github Data
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements arXiv 2024-02 Github Data
Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering arXiv 2023-11 Github
Orca 2: Teaching Small Language Models How to Reason arXiv 2023-11
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning NIPS Workshop 2023-10 Github Data
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 arXiv 2023-06
SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback Generation arXiv 2023-05

Preference

Title Venue Date Code Data
Ultrafeedback: Boosting language models with high-quality feedback arXiv 2023-10 Github Data
Zephyr: Direct Distillation of LM Alignment arXiv 2023-10 Github Data
Rlaif: Scaling Reinforcement Learning from Human Feedback with AI Feedback arXiv 2023-09
OPENCHAT: ADVANCING OPEN-SOURCE LANGUAGE MODELS WITH MIXED-QUALITY DATA ICLR 2023-09 Github Data
RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment arXiv 2023-07 Github
Aligning Large Language Models through Synthetic Feedbacks EMNLP 2023-05 Github Data
Reward Design with Language Models ICLR 2023-03 Github
Training Language Models with Language Feedback at Scale arXiv 2023-03
Constitutional AI: Harmlessness from AI Feedback arXiv 2022-12

Value

Title Venue Date Code Data
Ultrafeedback: Boosting language models with high-quality feedback arXiv 2023-10 Github Data
RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment arXiv 2023-07 Github
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision NeurIPS 2023-05 Github Data
Training Socially Aligned Language Models on Simulated Social Interactions arXiv 2023-05
Constitutional AI: Harmlessness from AI Feedback arXiv 2022-12

Agent

Tool Using

Title Venue Date Code Data
Toolformer: Language Models Can Teach Themselves to Use Tools arXiv 2023-02
Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT arXiv 2023-04 Github Data
Gorilla: Large Language Model Connected with Massive APIs arXiv 2023-05 Github Data
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction arXiv 2023-05 Github Data
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases arXiv 2023-06 Github Data
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs arXiv 2023-07 Github Data
Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum arXiv 2023-08 Github
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets arXiv 2023-09 Github
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning arXiv 2024-01 Github Data
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent arXiv 2024-01 Github
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction arXiv 2024-01 Github

Planning

Title Venue Date Code Data
AUTOACT: Automatic Agent Learning from Scratch via Self-Planning arXiv 2024-01 Github
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs arXiv 2023-11 Github Data
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems arXiv 2023-11
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld arXiv 2023-11
Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models CoRL 2023-11
Motif: Intrinsic Motivation from Artificial Intelligence Feedback ICLR 2023-10 Github
FireAct: Toward Language Agent Fine-tuning arXiv 2023-10 Github Data
AgentTuning: Enabling Generalized Agent Abilities for LLMs arXiv 2023-10 Github
Eureka: Human-Level Reward Design via Coding Large Language Models arXiv 2023-10 Github
Language Instructed Reinforcement Learning for Human-AI Coordination PMLR 2023-04
Guiding Pretraining in Reinforcement Learning with Large Language Models PMLR 2023-02
Distilling Internet-Scale Vision-Language Models into Embodied Agents ICML 2023-01

NLP Task Specialization

NLU

Title Venue Date Code Data
LLM vs Small Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model arXiv 2024-03
Evolving Knowledge Distillation with Large Language Models and Active Learning arXiv 2024-03
Mixed Distillation Helps Smaller Language Model Better Reasoning arXiv 2023-12
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation EMNLP 2023-10 Github
TinyLLM: Learning a Small Student from Multiple Large Language Models arXiv 2024-02
Targeted Data Generation: Finding and Fixing Model Weaknesses ACL 2023-05 Github
Distilling ChatGPT for Explainable Automated Student Answer Assessment arXiv 2023-05 Github
ChatGPT outperforms crowd workers for text-annotation tasks arXiv 2023-03
Annollm: Making large language models to be better crowdsourced annotators arXiv 2023-03
AugGPT: Leveraging ChatGPT for Text Data Augmentation arXiv 2023-02 Github
Is GPT-3 a Good Data Annotator? ACL 2022-12 Github
SunGen: Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning ICLR 2022-05 Github
ZeroGen: Efficient Zero-shot Learning via Dataset Generation EMNLP 2022-02 Github
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding NeurIPS 2022-02 Github
Towards Zero-Label Language Learning arXiv 2021-09
Generate, Annotate, and Learn: NLP with Synthetic Text TACL 2021-06

NLG

Title Venue Date Code Data
Tailoring Self-Rationalizers with Multi-Reward Distillation arXiv 2023-11 Github Data
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation arXiv 2023-10 Github
Neural Machine Translation Data Generation and Augmentation using ChatGPT arXiv 2023-07
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes ICLR 2023-06
Can LLMs generate high-quality synthetic note-oriented doctor-patient conversations? arXiv 2023-06 Github Data
InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT EMNLP 2023-05
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing arXiv 2023-05 Github
Data Augmentation for Radiology Report Simplification Findings of EACL 2023-04 Github
Want To Reduce Labeling Cost? GPT-3 Can Help Findings of EMNLP 2021-08

Information Retrieval

Title Venue Date Code Data
InstructDistill: Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers arXiv 2023-11 Github Data
Soft prompt tuning for augmenting dense retrieval with large language models arXiv 2023-07 Github
Query Rewriting in Retrieval-Augmented Large Language Models EMNLP 2023-05
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents EMNLP 2023-04 Github Data
AugTriever: Unsupervised Dense Retrieval by Scalable Data Augmentation arXiv 2022-12 Github
QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation EMNLP 2022-10
Promptagator: Few-shot Dense Retrieval From 8 Examples ICLR 2022-09
Questions Are All You Need to Train a Dense Passage Retrieval TACL 2022-06 Github
Improving Passage Retrieval with Zero-Shot Question Generation EMNLP 2022-04 Github Data
InPars: Data Augmentation for Information Retrieval using Large Language Models arXiv 2022-02 Github Data
Generating Datasets with Pretrained Language Models EMNLP 2021-04 Github

Recommendation

Title Venue Date Code Data
Can Small Language Models be Good Reasoners for Sequential Recommendation? arXiv 2024-03
Large Language Model Augmented Narrative Driven Recommendations arXiv 2023-06
Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach arXiv 2023-05
ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models WSDM 2023-05 Github Data

Text Generation Evaluation

Title Venue Date Code Data
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models ICLR 2023-10 Github Data
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks arXiv 2023-10 Github Data
Generative Judge for Evaluating Alignment ICLR 2023-10 Github Data
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization arXiv 2023-06 Github Data
INSTRUCTSCORE: Explainable Text Generation Evaluation with Fine-grained Feedback EMNLP 2023-05 Github Data

Code

Title Venue Date Code Data
Magicoder: Source Code Is All You Need arXiv 2023-12 Github Data
Data
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation arXiv 2023-12
Instruction Fusion: Advancing Prompt Evolution through Hybridization arXiv 2023-12
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning arXiv 2023-11 Github Data
Data
LLM-Assisted Code Cleaning For Training Accurate Code Generators arXiv 2023-11
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation EMNLP 2023-10 Github
Code Llama: Open Foundation Models for Code arXiv 2023-08 Github
Distilled GPT for Source Code Summarization arXiv 2023-08 Github Data
Textbooks Are All You Need: A Large-Scale Instructional Text Data Set for Language Models arXiv 2023-06
Code Alpaca: An Instruction-following LLaMA model for code generation - 2023-03 Github Data

Multi-Modality

Title Venue Date Code Data
Miko: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery arXiv 2024-02
Localizing Visual Commonsense Knowledge in Large Language Models NeurIPS 2023-12 Github Data
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning arXiv 2023-11 Github Data
ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine Conversations arXiv 2023-10
NExT-GPT: Any-to-Any Multimodal LLM arXiv 2023-09 Github Data
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data arXiv 2023-08 Github Data
PointLLM: Empowering Large Language Models to Understand Point Clouds arXiv 2023-08 Github Data
SVIT: Scaling up Visual Instruction Tuning arXiv 2023-07 Github Data
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning arXiv 2023-07
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic arXiv 2023-06 Github Data
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning ICLR 2023-06 Github Data
Valley: Video Assistant with Large Language model Enhanced abilitY arXiv 2023-06 Github Data
DetGPT: Detect What You Need via Reasoning EMNLP 2023-05 Github
Visual Instruction Tuning: A Comprehensive Study of Visual Instruction Tuning for Large Language Models NeurIPS 2023-04 Github Data

Summary Table


Figure: A summary of representative works about skill distillation.

Verticalization Distillation

Law

Title Venue Date Code Data
Fuzi - 2023-08 Github
ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases arXiv 2023-06 Github
Lawyer LLaMA Technical Report arXiv 2023-05 Github Data

Medical & Healthcare

Title Venue Date Code Data
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs arXiv 2023-11 Github Data
AlpaCare: Instruction-tuned large language models for medical application arXiv 2023-10 Github Data
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation arXiv 2023-08 Github Data
HuatuoGPT: Taming Language Model to Be a Doctor EMNLP 2023-05 Github Data
DoctorGLM: Fine-tuning your Chinese doctor is not a herculean task arXiv 2023-04 Github Data
Huatuo: Tuning LLM with Chinese Medical Knowledge arXiv 2023-04 Github
MedAlpaca: An Open-Source Collection of Medical Conversational AI Models and Training Data arXiv 2023-04 Github Data
PMC-LLaMA: Further Finetuning LLaMA on Medical Papers arXiv 2023-04 Github Data
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge arXiv 2023-03 Github

Finance

Title Venue Date Code Data
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters CIKM 2023-05

Science

Title Venue Date Code Data
MuseGraph: Graph-oriented Instruction Tuning of Large Language Models for Generic Graph Mining arXiv 2024-03
SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning arXiv 2024-01 Github
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets arXiv 2024-01
GeoGalactica: A Scientific Large Language Model in Geoscience arXiv 2024-01 Github Data
InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery arXiv 2023-11 Github
LLM-Prop: Predicting Physical And Electronic Properties Of Crystalline Solids From Their Text Descriptions arXiv 2023-10 Github
OceanGPT: A Large Language Model for Ocean Science Tasks arXiv 2023-10 Github Data
MarineGPT: Unlocking Secrets of Ocean to the Public arXiv 2023-10 Github
Mammoth: Building Math Generalist Models through Hybrid Instruction Tuning arXiv 2023-09 Github Data
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving ICLR 2023-09 Github
DARWIN Series: Domain Specific Large Language Models for Natural Science arXiv 2023-08 Github
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct arXiv 2023-08 Github
Biomedgpt: Open Multimodal Generative Pre-trained Transformer for Biomedicine arXiv 2023-08 Github Data
Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers NeurIPS 2023-07
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein bioRxiv 2023-07
GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning NeurIPS 2023-06 Github Data
K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization arXiv 2023-06 Github
Visual Instruction Tuning: A Comprehensive Study of Visual Instruction Tuning for Large Language Models NeurIPS 2023-04 Github Data

Misc.

Title Venue Date Code Data
OWL: A Large Language Model for IT Operations arXiv 2023-09 Github Data
EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education arXiv 2023-08 Github Data

Encoder-based KD

Note: Our survey mainly focuses on generative LLMs, and thus the encoder-based KD is not included in the survey. However, we are also interested in this topic and continue to update the latest works in this area.

Title Venue Date Code Data
Masked Latent Semantic Modeling: an Efficient Pre-training Alternative to Masked Language Modeling Findings of ACL 2023-08
Better Together: Jointly Using Masked Latent Semantic Modeling and Masked Language Modeling for Sample Efficient Pre-training CoNLL 2023-08

Citation

If you find this repository helpful, please consider citing the following paper:

@misc{xu2024survey,
      title={A Survey on Knowledge Distillation of Large Language Models}, 
      author={Xiaohan Xu and Ming Li and Chongyang Tao and Tao Shen and Reynold Cheng and Jinyang Li and Can Xu and Dacheng Tao and Tianyi Zhou},
      year={2024},
      eprint={2402.13116},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Star History

Star History Chart