Skip to content

Must-read papers and blogs about parametric knowledge mechanism in LLMs.

License

Notifications You must be signed in to change notification settings

Trae1ounG/Awesome-Parametric-Knowledge-in-LLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 

Repository files navigation

Awesome Parametric Knowledge in LLMs

LICENSE Awesome commit PR GitHub Repo stars

This repo includes papers about parametric knowledge in LLMs, now we have parametric knowledge detection and parametric knowledge application these two main categories!👻

We believe that the parametric knowledge in LLMs is still a largely unexplored area, and we hope this repository will provide you with some valuable insights!😶‍🌫️

Prametric Knowledge Detection

Knowledge in Transformer-based Model——Analysis🧠

2025

  1. Decoding specialised feature neurons in LLMs with the final projection layer

    [Logits Lens, Analysis of Query Neuron]

2024

  1. What does the knowledge neuron thesis have to do with knowledge?

    Jingcheng Niu, Andrew Liu, Zining Zhu, Gerald Penn. ICLR'24(Spotlight)

  2. Knowledge Mechanisms in Large Language Models: A Survey and Perspective

    Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang. EMNLP'24 Findings

  3. Disentangling Memory and Reasoning Ability in Large Language Models github repo stars

    Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang. preprint'24

  4. Linguistic collapse: Neural collapse in (large) language modelsgithub repo stars

    Robert Wu, Vardan Papyan. NIPS'24

  5. Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Modelsgithub repo stars

    Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, William Yang Wang. Preprint'24

  6. Evaluating the External and Parametric Knowledge Fusion of Large Language Models

    Hao Zhang, Yuyang Zhang, Xiaoguang Li, Wenxuan Shi, Haonan Xu, Huanshuo Liu, Yasheng Wang, Lifeng Shang, Qun Liu, Yong Liu, Ruiming Tang. Preprint'24

  7. Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflictsgithub repo stars

    Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, Yu Su. ICLR'24 Spotlight

  8. Knowledge entropy decay during language model pretraining hinders new knowledge acquisition

    Jiyeon Kim, Hyunji Lee, Hyowon Cho, Joel Jang, Hyeonbin Hwang, Seungpil Won, Youbin Ahn, Dohaeng Lee, Minjoon Seo. Preprint'24

  9. When Context Leads but Parametric Memory Follows in Large Language Modelsgithub repo stars

    Yufei Tao, Adam Hiatt, Erik Haake, Antonie J. Jetter, Ameeta Agrawal. EMNLP'24

  10. Neuron-level knowledge attribution in large language modelsgithub repo stars

    Zeping Yu, Sophia Ananiadou. EMNLP'24

  11. Dissecting recall of factual associations in auto-regressive language models[code]

    Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson. EMNLP'23

2021

  1. Transformer Feed-Forward Layers Are Key-Value Memories

    Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy. EMNLP'21

Knowledge in Transformer-based Model——Causal Tracing🦾

  1. Does knowledge localization hold true? Surprising differences between entity and relation perspectives in language models

    Yifan Wei, Xiaoyan Yu, Yixuan Weng, Huanhuan Ma, Yuanzhe Zhang, Jun Zhao, Kang Liu. CIKM'24

2022

  1. Locating and Editing Factual Associations in GPT

    Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov. NIPS'22

2024

Knowledge in Transformer-based Model——Gradient Attribution👀

  1. Identifying query-relevant neurons in large language models for long-form texts

    Lihu Chen, Adam Dejl, Francesca Toni. Preprint'24

  2. Revealing the parametric knowledge of language models: A unified framework for attribution methods

    Haeun Yu, Pepa Atanasova, Isabelle Augenstein. ACL'24

  3. Does Large Language Model contain Task-Specific Neurons.

    Ran Song, Shizhu He, Shuting Jiang, Yantuan Xian, Shengxiang Gao, Kang Liu, and Zhengtao Yu. EMNLP'24

  4. Journey to the center of the knowledge neurons: Discoveries of language-independent knowledge neurons and degenerate knowledge neuronsgithub repo stars

    Yuheng Chen, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao. AAAI'24

2022

  1. Knowledge Neurons in Pretrained Transformersgithub repo stars

    Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei. ACL'22

Knowledge in Transformer-based Model——Activation🫀

2024

  1. Separating tongue from thought: Activation patching reveals language-agnostic concept representations in transformers github repo stars

    Clément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West. ICLR'24 Spotlight

  2. From yes-men to truth-tellers Addressing sycophancy in large language models with pinpoint tuning

    Wei Chen, Zhen Huang, Liang Xie, Binbin Lin, Houqiang Li, Le Lu, Xinmei Tian, Deng Cai, Yonggang Zhang, Wenxiao Wang, Xu Shen, Jieping Ye. ICML'24

  3. Language-specific neurons: The key to multilingual capabilities in large language models.

    Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen. ACL'24

  4. Multi-property Steering of Large Language Models with Dynamic Activation Composition github repo stars

    Daniel Scalena, Gabriele Sarti, Malvina Nissim. ACL'24 BlackboxNLP Workshop

  5. Exploring the benefit of activation sparsity in pre-traininggithub repo stars

    [MoE, Activation Sparsity, Activation Pattern, Inference Speedup] Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou. ICML'24

2023

  1. Activation Addition: Steering Language Models Without Optimization

    Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, Monte MacDiarmid. Preprint'23

  2. Deja vu: Contextual sparsity for efficient LLMs at inference time

    [Sparsity, Inference Speedup] Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen. ICML'23

Parametric Knowledge Application

Knowledge Editing 🧑‍⚕️

2024

  1. A Comprehensive Study of Knowledge Editing for Large Language Models

    Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun Chen. Preprint'24

  2. FAME: Towards Factual Multi-Task Model EditingGitHub Repo stars Li Zeng, Yingyu Shan, Zeming Liu, Jiashu Yao, Yuhang Guo. EMNLP'24

  3. To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Modelsgithub repo stars

    Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang. EMNLP'24 findings

  4. Understanding the Collapse of LLMs in Model Editing

    Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen. EMNLP'24 findings

  5. Is it possible to edit large language models robustly?github repo stars

    Xinbei Ma, Tianjie Ju, Jiyang Qiu, Zhuosheng Zhang, Hai Zhao, Lifeng Liu, Yulong Wang. Preprint'24

  6. Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answeringgithub repo stars

    Yucheng Shi, Qiaoyu Tan, Xuansheng Wu, Shaochen Zhong, Kaixiong Zhou, Ninghao Liu. CIKM'24

  7. Latent paraphrasing: Perturbation on layers improves knowledge injection in language models

    Minki Kang, Sung Ju Hwang, Gibbeum Lee, Jaewoong Cho. NIPS'24

  8. Learning to edit: Aligning LLMs with knowledge editinggithub repo stars

    Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, Wei Wang. ACL'24

  9. Inspecting and Editing Knowledge Representations in Language Modelsgithub repo stars

    Evan Hernandez, Belinda Z. Li, Jacob Andreas. COLM'24

  10. Forgetting before learning: Utilizing parametric arithmetic for knowledge updating in large language models

    Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang. ACL'24

  11. Ethos: Rectifying language models in orthogonal parameter space

    [Toxic/Bias Unlearning, SVD, Analysis of Parametric Knowledge, Task Vector]

    NAACL'24 findings

2023

  1. Editing Large Language Models: Problems, Methods, and Opportunities

    Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang. EMNLP'23

2022

  1. Locating and Editing Factual Associations in GPT

    Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov. NIPS'22

  2. Memory-Based Model Editing at Scale

    Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, Chelsea Finn. ICLR'22

2021

  1. Editing Factual Knowledge in Language Models

    Nicola De Cao, Wilker Aziz, Ivan Titov. EMNLP'21

2020

  1. Editable neural networks.

    Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitriy Pyrkin, Sergei Popov, Artem Babenko. ICLR'20

Knowledge Transfer🧚‍♀️

2024

  1. Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He. ICLR'24

  1. Initializing models with larger onesgithub repo stars

    Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu. ICLR'24 Spotlight

  2. Cross-model Control: Improving Multiple Large Language Models in One-time Traininggithub repo stars

    Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao. NIPS'24

  3. Knowledge fusion of large language modelsgithub repo stars

    Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi. ICLR'24

  4. Tuning language models by proxygithub repo stars

    Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith. COLM'24

  5. Chat vector: A simple approach to equip LLMs with instruction following and model alignment in new languagesgithub repo stars

    [Task Vector, Parametric Knowledge, Knowledge Transfer]

    ACL'24

  6. FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models

    [Federated Learning, Knowledge Transfer, Heterogeneous Token Alignment]

    Coling'25

  7. Function vectors in large language models

    [Function Vector, Causal Mediation, Mechanism Interpretation]

    ICLR'24

  8. Refine large language model fine-tuning via instruction vector

    [Catastrophic Forgetting, Function Vector, Causal Mediation]

    Preprint'24

  9. KlF: Knowledge localization and fusion for language model continual learning

    [Catastrophic Forgetting, Continual Learning, Sensetity-based Location]

    ACL'24

  10. Language models are super mario: Absorbing abilities from homologous models as a free lunch

    [Knowledge Transfer, Model Merging, Efficient Skill] ICML'24

  11. Beyond task vectors: Selective task arithmetic based on importance metrics

    [Task Vector, Sensetivity-based Importance Score, Model Merging] Preprint'24

2023

  1. Mutual enhancement of large and small language models with cross-silo knowledge transfer

    Yongheng Deng, Ziqing Qiao, Ju Ren, Yang Liu, Yaoxue Zhang. Preprint'23

  2. Learning to grow pretrained models for efficient transformer traininggithub repo stars

    Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David D. Cox, Zhangyang Wang, Yoon Kim. ICLR'23

  3. Retrieval-based knowledge transfer: An effective approach for extreme large language model compression

    Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan. EMNLP'23 Findings

  4. Editing models with task arithmeticgithub repo stars

    [Task Vecotr, Parametric Knowledge, Knowledge Transfer, Multi-task Learning]

    ICLR'23

  5. Task-Specific Skill Localization in Fine-tuned Language Models

    [Knowledge Transfer, Model Graft, Skill Parameter Localization]

    ICML'23

  6. Composing parameter-efficient modules with arithmetic operations

    [PEFT, Task Vector, Model Merge]

    NIPS'23

  7. Dataless knowledge fusion by merging weights of language models

    [Model Merge]

    ICLR'23

2021

  1. Weight distillation: Transferring the knowledge in neural network parametersgithub repo stars

    Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu. ACL'21

Activation Steering

2024

  1. Multi-property Steering of Large Language Models with Dynamic Activation Composition github repo stars

    Daniel Scalena, Gabriele Sarti, Malvina Nissim. ACL'24 BlackboxNLP Workshop

  2. Word embeddings are steers for language models

    [Word Embedding Steering, Generation Control] ACL'24

2023

  1. Activation Addition: Steering Language Models Without Optimization

Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, Monte MacDiarmid. Preprint'23

Knowledge Distillation

2024

  1. PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuninggithub repo stars(Note: not parametric)

    Gyeongman Kim, Doohyuk Jang, Eunho Yang. EMNLP'24 findings

  2. From Instance Training to Instruction Learning: Task Adapters Generation from Instructionsgithub repo stars

    Huanxuan Liao, Yao Xu, Shizhu He, Yuanzhe Zhang, Yanchao Hao, Shengping Liu, Kang Liu, Jun Zhao. NIPS'24

  3. When babies teach babies: Can student knowledge sharing outperform teacher-guided distillation on small datasets?

    Srikrishna Iyer. EMNLP'24 CoNLL Workshop

Pramatric Quantization

2024

  1. OneBit: Towards extremely low-bit large language models github repo stars

    Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che. NIPS'24

2023

  1. The cost of compression: Investigating the impact of compression on parametric knowledge in language models github repo stars

    Satya Sai Srinath Namburi, Makesh Sreedhar, Srinath Srinivasan, Frederic Sala. EMNLP'23 findings

Knowledge Injection

2024

  1. Awakening augmented generation: Learning to awaken internal knowledge of large language models for question answeringgithub repo stars

    [HyperNet, RAG, Context Compression]

    Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Shengping Liu, Jun Zhao. AAAI'25

2023

  1. Memory injections: Correcting multi-hop reasoning failures during inference in transformer-based language modelsgithub repo stars

    Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster. Oral Presentation at BlackboxNLP Workshop at EMNLP'23

  2. Decouple knowledge from parameters for plug-and-play language modelinggithub repo stars

    Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, Rui Yan. ACL'23 findings

  3. IN-PARAMETER KNOWLEDGE INJECTION: INTEGRATING TEMPORARY CONTEXTUAL INFORMATION INTO MODEL PARAMETERS

    submitted to ICLR'25

2022

  1. Kformer: Knowledge injection in transformer feed-forward layersgithub repo stars

    Yunzhi Yao, Shaohan Huang, Li Dong, Furu Wei, Huajun Chen, Ningyu Zhang. NLPCC'22

Parameter-Effecient Fine-tuning(PEFT)

2024

  1. KaSA: Knowledge-aware singular-value adaptation of large language modelsgithub repo stars

    [Knowledge-aware LoRA, SVD]

    Fan Wang, Juyong Jiang, Chansung Park, Sunghun Kim, Jing Tang. Preprint'24

  2. CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuninggithub repo stars

    [Knowledge-aware LoRA, SVD]

    Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem. NIPS'24

  3. DoRA: Weight-Decomposed Low-Rank Adaptationgithub repo stars

    [Weight-Decomposed LoRA, SVD, Analysis of FT and LoRA] Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen. ICML'24 Oral

  4. Low-rank adaptation with task-relevant feature enhancement for fine-tuning language models

    [Task-aware LoRA, Hidden Representation Enhancement] AAAI'25 CoLoRAI Workshop

Continual Learning

2024

  1. Learn more, but bother less: Parameter efficient continual learning

    [Continual Learning, Parameter Efficient, Knowledge Transfer] NIPS'24

  2. What will my model forget? Forecasting forgotten examples in language model refinement

    [Catastrophic Forgetting, Forecasting Forgetting, Analysis] ICML'24 Spotlight

RAG

2024

  1. xRAG: Extreme context compression for retrieval-augmented generation with one token

    [Context Compression, RAG, Multimodal Fusion] NIPS'24

Long Context Extend

2024

  1. LongEmbed: Extending embedding models for long context retrieval

    [Long Context, Embedding Model, Benchmark] EMNLP'24

  2. LLM maybe LongLM: Self-extend LLM context window without tuning

    [Long Context Extend, Plug-and-Play Method] ICML'24 Spotlight

  3. Two stones hit one bird: Bilevel positional encoding for better length extrapolation

    [Long Context Extend, Absolute PE + Relative PE, Plug-and-Play but Training-based Method] ICML'24

2023

  1. YaRN: Efficient context window extension of large language models[http://arxiv.org/abs/2309.00071]

    [Long Context Extend, Variation of RoPE] ICLR'24

2022

  1. Train short, test long: Attention with linear biases enables input length extrapolation

    [Alibi, Long Context Extrapolate, Training-based Method] ICLR'22

2021

  1. RoFormer: Enhanced Transformer with Rotary Position Embedding.

    [Rotary Position Embedding, Classic]

Star History

Star History Chart

Releases

No releases published

Packages

No packages published