Generalizing Influence to Demontration Attribution in In-context Learning

Paper link (Our paper is accepted to NeurIPS 2024!)

Abstract: In-context learning (ICL) allows transformer-based language models that are pre-trained on general text to quickly learn a specific task with a few "task demonstrations" without updating their parameters, significantly boosting their flexibility and generality. ICL possesses many distinct characteristics from conventional machine learning, thereby requiring new approaches to interpret this learning paradigm. Taking the viewpoint of recent works showing that transformers learn in context by formulating an internal optimizer, we propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL. We empirically verify the effectiveness of our approach for demonstration attribution while being computationally efficient. Leveraging the results, we then show how DETAIL can help improve model performance in real-world scenarios through demonstration reordering and curation. Finally, we experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.

Usage

For running the MNIST experiment, install the conda environment jax_env.yml and run CUDA_VISIBLE_DEVICES=0 python mnist_data_attr.py
For running the LLM experiments, install the conda environment torch_env.yml and run the programs described in script.sh
To analyze the results, use analyzer.ipynb.

Acknowledgement

Parts of the code are referenced from the following repositories

Credit our work

If you find our work interesting, please star our repository. If you wish to cite our paper, you may use the following citation format

@inproceedings{zhou2024detail,
      title={DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning}, 
      author={Zijian Zhou and Xiaoqiang Lin and Xinyi Xu and Alok Prakash and Daniela Rus and Bryan Kian Hsiang Low},
      year={2024},
      booktitle={Advances in Neural Information Processing Systems}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
README.md		README.md
analyzer.ipynb		analyzer.ipynb
cls_llm.py		cls_llm.py
cls_llm_comp.py		cls_llm_comp.py
cls_llm_layers.py		cls_llm_layers.py
cls_llm_order.py		cls_llm_order.py
cls_llm_other.py		cls_llm_other.py
cls_llm_other_transfer.py		cls_llm_other_transfer.py
cls_llm_pos.py		cls_llm_pos.py
cls_llm_pos_transfer.py		cls_llm_pos_transfer.py
detect_llm.py		detect_llm.py
detect_llm_layers.py		detect_llm_layers.py
detect_llm_proj.py		detect_llm_proj.py
gpt_config.yaml		gpt_config.yaml
gpt_query.py		gpt_query.py
jax_env.yml		jax_env.yml
mnist_data_attr.ipynb		mnist_data_attr.ipynb
mnist_data_attr.py		mnist_data_attr.py
script.sh		script.sh
torch_env.yml		torch_env.yml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalizing Influence to Demontration Attribution in In-context Learning

Paper link (Our paper is accepted to NeurIPS 2024!)

Usage

Acknowledgement

Credit our work

About

Releases

Packages

Contributors 2

Languages

BobbyZhouZijian/detail_release

Folders and files

Latest commit

History

Repository files navigation

Generalizing Influence to Demontration Attribution in In-context Learning

Paper link (Our paper is accepted to NeurIPS 2024!)

Usage

Acknowledgement

Credit our work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages