Large Language Models for Data Annotation and Synthesis: A Survey

This is a curated list of papers about LLM for Data Annotation and Synthesis maintained by Dawei Li (daweili5@asu.edu)
If you want to add new entries, please make PRs with the same format.
This list serves as a complement to our EMNLP 2024 oral survey: [Large Language Models for Data Annotation and Synthesis: A Survey]

🔔 News

2024-12 Check our new paper list and survey on LLM-as-a-judge!
2024-12 We update our paper list and include papers for LLM-based data annotation & synthesis in December 2024!

If you find this repo helpful, we would appreciate it if you could cite our survey.

@article{tan2024large,
  title={Large language models for data annotation: A survey},
  author={Tan, Zhen and Li, Dawei and Wang, Song and Beigi, Alimohammad and Jiang, Bohan and Bhattacharjee, Amrita and Karami, Mansooreh and Li, Jundong and Cheng, Lu and Liu, Huan},
  journal={arXiv preprint arXiv:2402.13446},
  year={2024}
}

Updated

Dec 2024

A text-to-tabular approach to generate synthetic patient data using LLMs Margaux Tornqvist, Jean-Daniel Zucker, Tristan Fauvel, Nicolas Lambert, Mathilde Berthelot, Antoine Movschin. arXiv preprint arXiv:2412.05153 (2024) [link]
Give me Some Hard Questions: Synthetic Data Generation for Clinical QA Fan Bai, Keith Harrigian, Joel Stremmel, Hamid Hassanzadeh, Ardavan Saeedi, Mark Dredze. arXiv preprint arXiv:2412.04573 (2024) [link]
Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang. arXiv preprint arXiv:2412.04871 (2024) [link]
Can Open-source LLMs Enhance Data Synthesis for Toxic Detection?: An Experimental Study Zheng Hui, Zhaoxiao Guo, Hang Zhao, Juanyong Duan, Lin Ai, Yinheng Li, Julia Hirschberg, Congrui Huang. arXiv preprint arXiv:2411.15175 (2024) [link]
Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation Long Truong To, Hung Tuan Le, Dat Van-Thanh Nguyen, Manh Trong Nguyen, Tri Thien Nguyen, Tin Van Huynh, Kiet Van Nguyen. arXiv preprint arXiv:2411.05641 (2024) [link]
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs Suhas S Kowshik, Abhishek Divekar, Vijit Malik. arXiv preprint arXiv:2411.08553 (2024) [link]
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Yiheng Xu, Dunjie Lu, Zhennan Shen, Junli Wang, Zekun Wang, Yuchen Mao, Caiming Xiong, Tao Yu. arXiv preprint arXiv:2412.09605 (2024) [link]
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang. arXiv preprint arXiv:2412.08467 (2024) [link]
Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax Variance-Guided LLMs without Real Data Replay Ruiheng Liu, Jinyu Zhang, Yanqi Song, Yu Zhang, Bailong Yang. arXiv preprint arXiv:2412.07246 (2024) [link]
Language Models as Continuous Self-Evolving Data Engineers Peidong Wang, Ming Wang, Zhiming Ma, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang. arXiv preprint arXiv:2412.15151 (2024) [link]
From Human Annotation to LLMs: SILICON Annotation Workflow for Management Research Xiang Cheng, Raveesh Mayya, João Sedoc. arXiv preprint arXiv:2412.14461 (2024) [link]
Cognition Chain for Explainable Psychological Stress Detection on Social Media Xin Wang, Boyan Gao, Yi Dai, Lei Cao, Liang Zhao, Yibo Yang, David Clifton. arXiv preprint arXiv:2412.14009 (2024) [link]
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Shuting Wang, Jiejun Tan, Zhicheng Dou, Ji-Rong Wen. arXiv preprint arXiv:2412.13018 (2024) [link]
DS2-ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment Analysis Hongling Xu, Yice Zhang, Qianlong Wang, Ruifeng Xu. arXiv preprint arXiv:2412.14849 (2024) [link]
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Junjie Zhou, Zheng Liu, Ze Liu, Shitao Xiao, Yueze Wang, Bo Zhao, Chen Jason Zhang, Defu Lian, Yongping Xiong. arXiv preprint arXiv:2412.14475 (2024) [link]
Can LLMs Convert Graphs to Text-Attributed Graphs? Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye. arXiv preprint arXiv:2412.10136 (2024) [link]
A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions Jiankang Wang, Jianjun Xu, Xiaorui Wang, Yuxin Wang, Mengting Xing, Shancheng Fang, Zhineng Chen, Hongtao Xie, Yongdong Zhang. arXiv preprint arXiv:2412.08864 (2024) [link]
CoPrUS: Consistency Preserving Utterance Synthesis towards more realistic benchmark dialogues Sebastian Steindl, Ulrich Schäfer, Bernd Ludwig. arXiv preprint arXiv:2412.07515 (2024) [link]
FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering Amirhossein Abaskohi, Spandana Gella, Giuseppe Carenini, Issam H. Laradji. arXiv preprint arXiv:2412.07030 (2024) [link]
AIDE: Task-Specific Fine Tuning with Attribute Guided Multi-Hop Data Expansion Jiayu Li, Xuan Zhu, Fang Liu, Yanjun Qi. arXiv preprint arXiv:2412.06136 (2024) [link]
Seed-CTS: Unleashing the Power of Tree Search for Superior Performance in Competitive Coding Tasks Hao Wang, Boyi Liu, Yufeng Zhang, Jie Chen. arXiv preprint arXiv:2412.12544 (2024) [link]
Text2Relight: Creative Portrait Relighting with Text Guidance Junuk Cha, Mengwei Ren, Krishna Kumar Singh, He Zhang, Yannick Hold-Geoffroy, Seunghyun Yoon, HyunJoon Jung, Jae Shin Yoon, Seungryul Baek. arXiv preprint arXiv:2412.13734 (2024) [link]
ResoFilter: Fine-grained Synthetic Data Filtering for Large Language Models through Data-Parameter Resonance Analysis Zeao Tu, Xiangdi Meng, Yu He, Zihan Yao, Tianyu Qi, Jun Liu, Ming Li. arXiv preprint arXiv:2412.14809 (2024) [link]
How to Synthesize Text Data without Model Collapse? Xuekai Zhu, Daixuan Cheng, Hengli Li, Kaiyan Zhang, Ermo Hua, Xingtai Lv, Ning Ding, Zhouhan Lin, Zilong Zheng, Bowen Zhou. arXiv preprint arXiv:2412.14689 (2024) [link]
Libri2Vox Dataset: Target Speaker Extraction with Diverse Speaker Conditions and Synthetic Data Yun Liu, Xuechen Liu, Xiaoxiao Miao, Junichi Yamagishi. arXiv preprint arXiv:2412.12512 (2024) [link]
ALMA: Alignment with Minimal Annotation Michihiro Yasunaga, Leonid Shamis, Chunting Zhou, Andrew Cohen, Jason Weston, Luke Zettlemoyer, Marjan Ghazvininejad. arXiv preprint arXiv:2412.04305 (2024) [link]
MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification Saptarshi Sengupta, Kristal Curtis, Akshay Mallipeddi, Abhinav Mathur, Joseph Ross, Liang Gou. arXiv preprint arXiv:2412.04494 (2024) [link]
Piecing It All Together: Verifying Multi-Hop Multimodal Claims Haoran Wang, Aman Rangapur, Xiongxiao Xu, Yueqing Liang, Haroon Gharwi, Carl Yang, Kai Shu. arXiv preprint arXiv:2411.09547 (2024) [link]
Automated Collection of Evaluation Dataset for Semantic Search in Low-Resource Domain Language Anastasia Zhukova, Christian E. Matt, Bela Gipp. arXiv preprint arXiv:2412.10008 (2024) [link]
Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates Li Shi, Houjiang Liu, Yian Wong, Utkarsh Mujumdar, Dan Zhang, Jacek Gwizdka, Matthew Lease. arXiv preprint arXiv:2412.04629 (2024) [link]
A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI Beiduo Chen, Siyao Peng, Anna Korhonen, Barbara Plank. arXiv preprint arXiv:2412.13942 (2024) [link]
On Limitations of LLM as Annotator for Low Resource Languages Suramya Jadhav, Abhay Shanbhag, Amogh Thakurdesai, Ridhima Sinare, Raviraj Joshi. arXiv preprint arXiv:2411.17637 (2024) [link]
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification Taja Kuzman, Nikola Ljubešić. arXiv preprint arXiv:2411.19638 (2024) [link]
DSAI: Unbiased and Interpretable Latent Feature Extraction for Data-Centric AI Hyowon Cho, Soonwon Ka, Daechul Park, Jaewook Kang, Minjoon Seo, Bokyung Son. arXiv preprint arXiv:2412.06303 (2024) [link]
Rethinking Emotion Annotations in the Era of Large Language Models Minxue Niu, Yara El-Tawil, Amrit Romana, Emily Mower Provost. arXiv preprint arXiv:2412.07906 (2024) [link]
DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production Xiaoyun Liang, Jingyi Ren, Jiayi Qi, Chao Peng, Bo Jiang. arXiv preprint arXiv:2412.08069 (2024) [link]
Automated Collection of Evaluation Dataset for Semantic Search in Low-Resource Domain Language Anastasia Zhukova, Christian E. Matt, Bela Gipp. arXiv preprint arXiv:2412.10008 (2024) [link]
Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments Tuka Alhanai, Adam Kasumovic, Mohammad Ghassemi, Aven Zitzelberger, Jessica Lundin, Guillaume Chabot-Couture. arXiv preprint arXiv:2412.12417 (2024) [link]
Enhancing Persona Classification in Dialogue Systems: A Graph Neural Network Approach Konstantin Zaitsev. arXiv preprint arXiv:2412.13283 (2024) [link]
Fake News Detection: Comparative Evaluation of BERT-like Models and Large Language Models with Generative AI-Annotated Data Shaina Raza, Drai Paulen-Patterson, Chen Ding. arXiv preprint arXiv:2412.14276 (2024) [link]
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval Junjie Zhou, Zheng Liu, Shitao Xiao, Bo Zhao, Yongping Xiong. arXiv preprint arXiv:2406.04292 (2024) [link]

LLM-Based Data Annotation

Instruction & Response

Generating training data with language models: Towards zero-shot language understanding. Meng, Yu, Huang, Jiaxin, Zhang, Yu, and Han, Jiawei. Advances in Neural Information Processing Systems (2022) [link]
ZeroGen: Efficient Zero-shot Learning via Dataset Generation. Ye, Jiacheng, Gao, Jiahui, Li, Qintong, Xu, Hang, Feng, Jiangtao, Wu, Zhiyong, Yu, Tao, and Kong, Lingpeng. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (2022) [link]
CodecLM: Aligning Language Models with Tailored Synthetic Data. Wang, Zifeng, Li, Chun-Liang, Perot, Vincent, Le, Long T, Miao, Jin, Zhang, Zizhao, Lee, Chen-Yu, and Pfister, Tomas. arXiv preprint arXiv:2404.05875 (2024) [link]
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models. Wu, Siyuan, Huang, Yue, Gao, Chujie, Chen, Dongping, Zhang, Qihui, Wan, Yao, Zhou, Tianyi, Zhang, Xiangliang, Gao, Jianfeng, Xiao, Chaowei, and others. arXiv preprint arXiv:2406.18966 (2024) [link]
Best practices and lessons learned on synthetic data for language models. Liu, Ruibo, Wei, Jerry, Liu, Fangyu, Si, Chenglei, Zhang, Yanzhe, Rao, Jinmeng, Zheng, Steven, Peng, Daiyi, Yang, Diyi, Zhou, Denny, and others. arXiv preprint arXiv:2404.07503 (2024) [link]
Self-Alignment with Instruction Backtranslation. Li, Xian, Yu, Ping, Zhou, Chunting, Schick, Timo, Levy, Omer, Zettlemoyer, Luke, Weston, Jason E, and Lewis, Mike. The Twelfth International Conference on Learning Representations (2023) [link]
Preference ranking optimization for human alignment. Song, Feifan, Yu, Bowen, Li, Minghao, Yu, Haiyang, Huang, Fei, Li, Yongbin, and Wang, Houfeng. Proceedings of the AAAI Conference on Artificial Intelligence (2024) [link]
MathScale: Scaling Instruction Tuning for Mathematical Reasoning. Tang, Zhengyang, Zhang, Xingxing, Wang, Benyou, and Wei, Furu. Forty-first International Conference on Machine Learning (No Year) [link]
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation. Yoo, Kang Min, Park, Dongju, Kang, Jaewook, Lee, Sang-Woo, and Park, Woomyoung. Findings of the Association for Computational Linguistics: EMNLP 2021 (2021) [link]
Self-Consistency Improves Chain of Thought Reasoning in Language Models. Wang, Xuezhi, Wei, Jason, Schuurmans, Dale, Le, Quoc V, Chi, Ed H, Narang, Sharan, Chowdhery, Aakanksha, and Zhou, Denny. The Eleventh International Conference on Learning Representations (2022) [link]
Tuning language models as training data generators for augmentation-enhanced few-shot learning. Meng, Yu, Michalski, Martin, Huang, Jiaxin, Zhang, Yu, Abdelzaher, Tarek, and Han, Jiawei. International Conference on Machine Learning (2023) [link]
SASS: Self-Alignment with Semi-Supervised Instruction Data Generation. Wang, Yue, Zhang, Haoke, Li, Juntao, Chang, Jinxiong, Zhang, Qishen, Liu, Zhongyi, Zhang, Guannan, and Zhang, Min. No venue (2023) [link]
Targen: Targeted data generation with large language models. Gupta, Himanshu, Scaria, Kevin, Anantheswaran, Ujjwala, Verma, Shreyas, Parmar, Mihir, Sawant, Saurabh Arjun, Mishra, Swaroop, and Baral, Chitta. arXiv preprint arXiv:2310.17876 (2023) [link]
Let’s Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models. Wang, Ruida, Zhou, Wangchunshu, and Sachan, Mrinmaya. Findings of the Association for Computational Linguistics: EMNLP 2023 (2023) [link]
Dail: Data augmentation for in-context learning via self-paraphrase. Li, Dawei, Li, Yaxuan, Mekala, Dheeraj, Li, Shuyao, Wang, Xueqi, Hogan, William, Shang, Jingbo, and others. arXiv preprint arXiv:2311.03319 (2023) [link]
LongForm: Effective Instruction Tuning with Reverse Instructions. K{"o}ksal, Abdullatif, Schick, Timo, Korhonen, Anna, and Schuetze, Hinrich. ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models (No Year) [link]
Large language model as attributed training data generator: A tale of diversity and bias. Yu, Yue, Zhuang, Yuchen, Zhang, Jieyu, Meng, Yu, Ratner, Alexander J, Krishna, Ranjay, Shen, Jiaming, and Zhang, Chao. Advances in Neural Information Processing Systems (2024) [link]
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Xu, Zhangchen, Jiang, Fengqing, Niu, Luyao, Deng, Yuntian, Poovendran, Radha, Choi, Yejin, and Lin, Bill Yuchen. arXiv preprint arXiv:2406.08464 (2024) [link]
Scaling synthetic data creation with 1,000,000,000 personas. Chan, Xin, Wang, Xiaoyang, Yu, Dian, Mi, Haitao, and Yu, Dong. arXiv preprint arXiv:2406.20094 (2024) [link]
FANNO: Augmenting High-Quality Instruction Data with Open-Sourced LLMs Only. Zhu, He, Su, Junyou, Lun, Tianle, Tao, Yicheng, Zhang, Wenjia, Fan, Zipei, and Chen, Guanhua. arXiv preprint arXiv:2408.01323 (2024) [link]
CorrSynth-A Correlated Sampling Method for Diverse Dataset Generation from LLMs. Kowshik, Suhas, Divekar, Abhishek, and Malik, Vijit. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024) [link]
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation. Divekar, Abhishek, and Durrett, Greg. arXiv preprint arXiv:2405.10040 (2024) [link]
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search. Li, Chenglin, Chen, Qianglong, Li, Zhi, Tao, Feng, Li, Yicheng, Chen, Hao, Yu, Fei, and Zhang, Yin. arXiv preprint arXiv:2410.10392 (2024) [link]
Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions. Luo, Man, Warren, Christopher J, Cheng, Lu, Abdul-Muhsin, Haidar M, and Banerjee, Imon. arXiv preprint arXiv:2405.16402 (2024) [link]
Self-qa: Unsupervised knowledge guided language model alignment. Zhang, Xuanyu, and Yang, Qing. arXiv preprint arXiv:2305.11952 (2023) [link]
Large Language Models Can Self-Improve. Huang, Jiaxin, Gu, Shixiang, Hou, Le, Wu, Yuexin, Wang, Xuezhi, Yu, Hongkun, and Han, Jiawei. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning. Yang, Zhaorui, Liu, Qian, Pang, Tianyu, Wang, Han, Feng, Haozhe, Zhu, Minfeng, and Chen, Wei. arXiv preprint arXiv:2402.13669 (2024) [link]
Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation. Pang, Xianghe, Tang, Shuo, Ye, Rui, Xiong, Yuxin, Zhang, Bolun, Wang, Yanfeng, and Chen, Siheng. arXiv preprint arXiv:2402.05699 (2024) [link]
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment. Liu, Zhili, Gou, Yunhao, Chen, Kai, Hong, Lanqing, Gao, Jiahui, Mi, Fei, Zhang, Yu, Li, Zhenguo, Jiang, Xin, Liu, Qun, and others. arXiv preprint arXiv:2405.00557 (2024) [link]
Human-instruction-free llm self-alignment with limited samples. Guo, Hongyi, Yao, Yuanshun, Shen, Wei, Wei, Jiaheng, Zhang, Xiaoying, Wang, Zhaoran, and Liu, Yang. arXiv preprint arXiv:2401.06785 (2024) [link]
Principle-driven self-alignment of language models from scratch with minimal human supervision. Sun, Zhiqing, Shen, Yikang, Zhou, Qinhong, Zhang, Hongxin, Chen, Zhenfang, Cox, David, Yang, Yiming, and Gan, Chuang. Advances in Neural Information Processing Systems (2024) [link]
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping. Wang, Haoyu, Ma, Guozheng, Meng, Ziqiao, Qin, Zeyu, Shen, Li, Zhang, Zhong, Wu, Bingzhe, Liu, Liu, Bian, Yatao, Xu, Tingyang, and others. arXiv preprint arXiv:2402.07610 (2024) [link]
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources. Lupidi, Alisia, Gemmell, Carlos, Cancedda, Nicola, Dwivedi-Yu, Jane, Weston, Jason, Foerster, Jakob, Raileanu, Roberta, and Lomeli, Maria. arXiv preprint arXiv:2409.08239 (2024) [link]
ControlMath: Controllable Data Generation Promotes Math Generalist Models. Chen, Nuo, Wu, Ning, Chang, Jianhui, Shou, Linjun, and Li, Jia. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024) [link]

Label

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving. Liang, Mingfu, Su, Jong-Chyi, Schulter, Samuel, Garg, Sparsh, Zhao, Shiyu, Wu, Ying, Chandraker, Manmohan. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) [link]
Towards Automating Text Annotation: A Case Study on Semantic Proximity Annotation using GPT-4. Yadav, Sachin, Choppa, Tejaswi, and Schlechtweg, Dominik. arXiv preprint arXiv:2407.04130 (2024) [link]
Is a Large Language Model a Good Annotator for Event Extraction?. Chen, Ruirui, Qin, Chengwei, Jiang, Weifeng, and Choi, Dongkyu. Proceedings of the AAAI Conference on Artificial Intelligence (2024) [link]
Zero-Shot Topic Classification of Column Headers: Leveraging LLMs for Metadata Enrichment. Martorana, Margherita, Kuhn, Tobias, Stork, Lise, and van Ossenbruggen, Jacco. Knowledge Graphs in the Age of Language Models and Neuro-Symbolic AI (2024) [link]
Enhancing Text Annotation through Rationale-Driven Collaborative Few-Shot Prompting. Wu, Jianfei, Wang, Xubin, and Jia, Weijia. arXiv preprint arXiv:2409.09615 (2024) [link]
Can LLMs Replace Manual Annotation of Software Engineering Artifacts?. Ahmed, Toufique, Devanbu, Premkumar, Treude, Christoph, and Pradel, Michael. arXiv preprint arXiv:2408.05534 (2024) [link]
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation. Li, Minzhi, Shi, Taiwei, Ziems, Caleb, Kan, Min-Yen, Chen, Nancy, Liu, Zhengyuan, and Yang, Diyi. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Leveraging Large Language Models and Weak Supervision for Social Media data annotation: an evaluation using COVID-19 self-reported vaccination tweets. Tekumalla, Ramya, and Banda, Juan M. International Conference on Human-Computer Interaction (2023) [link]
Best Practices for Text Annotation with Large Language Models. T{"o}rnberg, Petter. arXiv preprint arXiv:2402.05129 (2024) [link]
Can large language models fix data annotation errors? an empirical study using debatepedia for query-focused text summarization. Laskar, Md Tahmid Rahman, Rahman, Mizanur, Jahan, Israt, Hoque, Enamul, and Huang, Jimmy. Findings of the Association for Computational Linguistics: EMNLP 2023 (2023) [link]
Large language models improve annotation of prokaryotic viral proteins. Flamholz, Zachary N, Biller, Steven J, and Kelly, Libusha. Nature Microbiology (2024) [link]
Prompting-based Synthetic Data Generation for Few-Shot Question Answering. Schmidt, Maximilian, Bartezzaghi, Andrea, and Vu, Ngoc Thang. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (2024) [link]
UniGen: Universal Domain Generalization for Sentiment Classification via Zero-shot Dataset Generation. Choi, Juhwan, Kim, Yeonghwa, Yu, Seunguk, Yun, JungMin, and Kim, YoungBin. arXiv preprint arXiv:2405.01022 (2024) [link]
Optimizing Code Retrieval: High-Quality and Scalable Dataset Annotation through Large Language Models. Li, Rui, Liu, Qi, He, Liyang, Zhang, Zheng, Zhang, Hao, Ye, Shengyu, Lu, Junyu, and Huang, Zhenya. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024) [link]
Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation. Choi, Juhwan, Yun, Jungmin, Jin, Kyohoon, and Kim, YoungBin. arXiv preprint arXiv:2404.09682 (2024) [link]
Fill In The Gaps: Model Calibration and Generalization with Synthetic Data. Ba, Yang, Mancenido, Michelle, and Pan, Rong. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024) [link]
Are Expert-Level Language Models Expert-Level Annotators?. Yu-Min Tseng, Wei-Lin Chen, Chung-Chi Chen and Hsin-Hsi Chen arXiv preprint arXiv: 2410.03254 (2024) [link]

Rationale

Large language models are zero-shot reasoners. Kojima, Takeshi, Gu, Shixiang Shane, Reid, Machel, Matsuo, Yutaka, and Iwasawa, Yusuke. Advances in neural information processing systems (2022) [link]
Reasoning with Language Model is Planning with World Model. Hao, Shibo, Gu, Yi, Ma, Haodi, Hong, Joshua, Wang, Zhen, Wang, Daisy, and Hu, Zhiting. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Tree of thoughts: Deliberate problem solving with large language models. Yao, Shunyu, Yu, Dian, Zhao, Jeffrey, Shafran, Izhak, Griffiths, Tom, Cao, Yuan, and Narasimhan, Karthik. Advances in Neural Information Processing Systems (2024) [link]
Graph of thoughts: Solving elaborate problems with large language models. Besta, Maciej, Blach, Nils, Kubicek, Ales, Gerstenberger, Robert, Podstawski, Michal, Gianinazzi, Lukas, Gajda, Joanna, Lehmann, Tomasz, Niewiadomski, Hubert, Nyczyk, Piotr, and others. Proceedings of the AAAI Conference on Artificial Intelligence (2024) [link]
Beyond chain-of-thought, effective graph-of-thought reasoning in large language models. Yao, Yao, Li, Zuchao, and Zhao, Hai. arXiv preprint arXiv:2305.16582 (2023) [link]
Chain-of-table: Evolving tables in the reasoning chain for table understanding. Wang, Zilong, Zhang, Hao, Li, Chun-Liang, Eisenschlos, Julian Martin, Perot, Vincent, Wang, Zifeng, Miculicich, Lesly, Fujii, Yasuhisa, Shang, Jingbo, Lee, Chen-Yu, and others. arXiv preprint arXiv:2401.04398 (2024) [link]
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks. Chen, Wenhu, Ma, Xueguang, Wang, Xinyi, and Cohen, William W. Transactions on Machine Learning Research (2023) [link]
The art of SOCRATIC QUESTIONING: Recursive thinking with large language models. Qi, Jingyuan, Xu, Zhiyang, Shen, Ying, Liu, Minqian, Jin, Di, Wang, Qifan, and Huang, Lifu. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Interpreting Pretrained Language Models via Concept Bottlenecks. Tan, Zhen, Cheng, Lu, Wang, Song, Bo, Yuan, Li, Jundong, and Liu, Huan. arXiv preprint arXiv:2311.05014 (2023) [link]
PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales. Wang, PeiFeng, Chan, Aaron, Ilievski, Filip, Chen, Muhao, and Ren, Xiang. The Eleventh International Conference on Learning Representations (2022) [link]
LogiCoT: Logical Chain-of-Thought Instruction Tuning. Liu, Hanmeng, Teng, Zhiyang, Cui, Leyang, Zhang, Chaoli, Zhou, Qiji, and Zhang, Yue. The 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Distilling Reasoning Capabilities into Smaller Language Models. Shridhar, Kumar, Stolfo, Alessandro, and Sachan, Mrinmaya. Findings of the Association for Computational Linguistics: ACL 2023 (2023) [link]
Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks. Kang, Minki, Lee, Seanie, Baek, Jinheon, Kawaguchi, Kenji, and Hwang, Sung Ju. Advances in Neural Information Processing Systems (2024) [link]
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data. Zhou, Jiaming, Ghaddar, Abbas, Zhang, Ge, Ma, Liheng, Hu, Yaochen, Pal, Soumyasundar, Coates, Mark, Wang, Bin, Zhang, Yingxue, and Hao, Jianye. arXiv preprint arXiv:2409.12437 (2024) [link]
Making Pre-trained Language Models Better Few-shot Learners. Gao, Tianyu, Fisch, Adam, and Chen, Danqi. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2021) [link]
Self-Consistency Improves Chain of Thought Reasoning in Language Models. Wang, Xuezhi, Wei, Jason, Schuurmans, Dale, Le, Quoc V, Chi, Ed H, Narang, Sharan, Chowdhery, Aakanksha, and Zhou, Denny. The Eleventh International Conference on Learning Representations (2022) [link]
Universal self-consistency for large language model generation. Chen, Xinyun, Aksitov, Renat, Alon, Uri, Ren, Jie, Xiao, Kefan, Yin, Pengcheng, Prakash, Sushant, Sutton, Charles, Wang, Xuezhi, and Zhou, Denny. arXiv preprint arXiv:2311.17311 (2023) [link]
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts. Liu, Tengxiao, Guo, Qipeng, Yang, Yuqing, Hu, Xiangkun, Zhang, Yue, Qiu, Xipeng, and Zhang, Zheng. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking. Tong, Yongqi, Wang, Yifan, Li, Dawei, Wang, Sizhe, Lin, Zi, Han, Simeng, and Shang, Jingbo. arXiv preprint arXiv:2310.12342 (2023) [link]
It's Not Easy Being Wrong: Evaluating Process of Elimination Reasoning in Large Language Models. Balepur, Nishant, Palta, Shramay, and Rudinger, Rachel. arXiv preprint arXiv:2311.07532 (2023) [link]
POE: Process of Elimination for Multiple Choice Reasoning. Ma, Chenkai, and Du, Xinya. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Exchange-of-thought: Enhancing large language model capabilities through cross-model communication. Yin, Zhangyue, Sun, Qiushi, Chang, Cheng, Guo, Qipeng, Dai, Junqi, Huang, Xuan-Jing, and Qiu, Xipeng. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Encouraging divergent thinking in large language models through multi-agent debate. Liang, Tian, He, Zhiwei, Jiao, Wenxiang, Wang, Xing, Wang, Yan, Wang, Rui, Yang, Yujiu, Tu, Zhaopeng, and Shi, Shuming. arXiv preprint arXiv:2305.19118 (2023) [link]
Towards reasoning in large language models via multi-agent peer review collaboration. Xu, Zhenran, Shi, Senbao, Hu, Baotian, Yu, Jindi, Li, Dongfang, Zhang, Min, and Wu, Yuxiang. arXiv preprint arXiv:2311.08152 (2023) [link]
Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization. Liu, Zijun, Zhang, Yanzhe, Li, Peng, Liu, Yang, and Yang, Diyi. arXiv preprint arXiv:2310.02170 (2023) [link]
Large Language Models Can Learn Temporal Reasoning. Siheng Xiong, Ali Payani, Ramana Kompella and Faramarz Fekri arXiv preprint arXiv: 2401.06853 (2024) [link]
Large Language Models Can Self-Improve. Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu and Jiawei Han arXiv preprint arXiv: 2210.11610 (2022) [link]
Case2Code: Learning Inductive Reasoning with Synthetic Data. Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang and Dahua Lin arXiv preprint arXiv: 2407.12504 (2024) [link]
Can LLMs Reason in the Wild with Programs? Yuan Yang, Siheng Xiong, Ali Payani, Ehsan Shareghi and Faramarz Fekri arXiv preprint arXiv: 2406.13764 (2024) [link]
Advancing LLM Reasoning Generalists with Preference Trees. Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, Bowen Zhou, Hao Peng, Zhiyuan Liu and Maosong Sun arXiv preprint arXiv: 2406.13764 (2024) [link]
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering. Lei Wang, Yi Hu, Jiabang He, Xing Xu, Ning Liu, Hui Liu and Heng Tao Shen *AAAI Conference on Artificial Intelligence (2024) [link]
Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model. Siheng Xiong, Ali Payani, Yuan Yang and Faramarz Fekri arXiv preprint arXiv: 2410.03136 (2024) [link]
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models. Mihir Parmar, Nisarg Patel, Neeraj Varshney, Mutsumi Nakamura, Man Luo, Santosh Mashetty, Arindam Mitra and Chitta Baral *Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (2024) [link]
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning. Jin Jiang, Yuchen Yan, Yang Liu, Yonggang Jin, Shuai Peng, Mengdi Zhang, Xunliang Cai, Yixin Cao, Liangcai Gao and Zhi Tang arXiv preprint arXiv: 2409.12929 (2024) [link]
Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning. Jiapu Wang, Kai Sun, Linhao Luo, Wei Wei, Yongli Hu, Alan Wee-Chung Liew, Shirui Pan and Baocai Yin arXiv preprint arXiv: 2405.14170 (2024) [link]
Orca: Progressive Learning from Complex Explanation Traces of GPT-4. Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi and Ahmed Awadallah arXiv preprint arXiv: 2306.02707 (2023) [link]
Orca 2: Teaching Small Language Models How to Reason. Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agarwal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour and Ahmed Awadallah arXiv preprint arXiv: 2311.11045 (2023) [link]
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement. Xiangyu Peng, Congying Xia, Xinyi Yang, Caiming Xiong, Chien-Sheng Wu and Chen Xing arXiv preprint arXiv: 2410.02108 (2024) [link]
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold. Amrith Setlur, Saurabh Garg, Xinyang Geng, Naman Garg, Virginia Smith and Aviral Kumar arXiv preprint arXiv: 2406.14532 (2024) [link]
STaR: Bootstrapping Reasoning With Reasoning. Eric Zelikman, Yuhuai Wu, Jesse Mu and Noah D. Goodman arXiv preprint arXiv: 2203.14465 (2022) [link]
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning. Bahare Fatemi, Mehran Kazemi, Anton Tsitsulin, Karishma Malkan, Jinyeong Yim, John Palowitch, Sungyong Seo, Jonathan Halcrow and Bryan Perozzi arXiv preprint arXiv: 2406.09170 (2024) [link]
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing. Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi and Dong Yu arXiv preprint arXiv: 2404.12253 (2024) [link]
Understanding Social Reasoning in Language Models with Language Models. Kanishk Gandhi, Jan-Philipp Fraenken, Tobias Gerstenberg and Noah Goodman Advances in Neural Information Processing Systems (2023) [link]
INSTRUCTRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales. Zhepei Wei, Wei-Lin Chen and Yu Meng arXiv preprint arXiv: 2406.13629 (2024) [link]

Pairwise Feedback

Constitutional ai: Harmlessness from ai feedback. Bai, Yuntao, Kadavath, Saurav, Kundu, Sandipan, Askell, Amanda, Kernion, Jackson, Jones, Andy, Chen, Anna, Goldie, Anna, Mirhoseini, Azalia, McKinnon, Cameron, and others. arXiv preprint arXiv:2212.08073 (2022) [link]
Rlaif: Scaling reinforcement learning from human feedback with ai feedback. Lee, Harrison, Phatale, Samrat, Mansoor, Hassan, Lu, Kellie, Mesnard, Thomas, Bishop, Colton, Carbune, Victor, and Rastogi, Abhinav. arXiv preprint arXiv:2309.00267 (2023) [link]
Self-rewarding language models. Yuan, Weizhe, Pang, Richard Yuanzhe, Cho, Kyunghyun, Sukhbaatar, Sainbayar, Xu, Jing, and Weston, Jason. arXiv preprint arXiv:2401.10020 (2024) [link]
SALMON: Self-Alignment with Instructable Reward Models. Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David D. Cox, Yiming Yang, and Chuang Gan. No venue (2023) [link]
Principle-driven self-alignment of language models from scratch with minimal human supervision. Sun, Zhiqing, Shen, Yikang, Zhou, Qinhong, Zhang, Hongxin, Chen, Zhenfang, Cox, David, Yang, Yiming, and Gan, Chuang. Advances in Neural Information Processing Systems (2024) [link]
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation. Zhang, Xiaoying, Peng, Baolin, Tian, Ye, Zhou, Jingyan, Jin, Lifeng, Song, Linfeng, Mi, Haitao, and Meng, Helen. arXiv preprint arXiv:2402.09267 (2024) [link]
West-of-N: Synthetic Preference Generation for Improved Reward Modeling. Pace, Aliz{'e}e, Mallinson, Jonathan, Malmi, Eric, Krause, Sebastian, and Severyn, Aliaksei. arXiv preprint arXiv:2401.12086 (2024) [link]
Learning Reward for Robot Skills Using Large Language Models via Self-Alignment. Zeng, Yuwei, Mu, Yao, and Shao, Lin. arXiv preprint arXiv:2405.07162 (2024) [link]
Improving Language Model Reasoning with Self-motivated Learning. Feng, Yunlong, Xu, Yang, Qin, Libo, Wang, Yasheng, and Che, Wanxiang. arXiv preprint arXiv:2404.07017 (2024) [link]
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection. Lee, Kyungjae, Hwang, Dasol, Park, Sunghyun, Jang, Youngsoo, and Lee, Moontae. arXiv preprint arXiv:2403.14238 (2024) [link]
Aligning Large Language Models through Synthetic Feedback. Kim, Sungdong, Bae, Sanghwan, Shin, Jamin, Kang, Soyoung, Kwak, Donghyun, Yoo, Kang, and Seo, Minjoon. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Optimizing Language Model's Reasoning Abilities with Weak Supervision. Tong, Yongqi, Wang, Sizhe, Li, Dawei, Wang, Yifan, Han, Simeng, Lin, Zi, Huang, Chengsong, Huang, Jiaxin, and Shang, Jingbo. arXiv preprint arXiv:2405.04086 (2024) [link]
RLCD: Reinforcement learning from contrastive distillation for LM alignment. Yang, Kevin, Klein, Dan, Celikyilmaz, Asli, Peng, Nanyun, and Tian, Yuandong. The Twelfth International Conference on Learning Representations (2023) [link]
Contrastive post-training large language models on data curriculum. Xu, Canwen, Rosset, Corby, Del Corro, Luciano, Mahajan, Shweti, McAuley, Julian, Neville, Jennifer, Awadallah, Ahmed Hassan, and Rao, Nikhil. arXiv preprint arXiv:2310.02263 (2023) [link]

Textual Feedback

Automatically correcting large language models: Surveying the landscape of diverse automated correction strategies. Pan, Liangming, Saxon, Michael, Xu, Wenda, Nathani, Deepak, Wang, Xinyi, and Wang, William Yang. Transactions of the Association for Computational Linguistics (2024) [link]
Self-refine: Iterative refinement with self-feedback. Madaan, Aman, Tandon, Niket, Gupta, Prakhar, Hallinan, Skyler, Gao, Luyu, Wiegreffe, Sarah, Alon, Uri, Dziri, Nouha, Prabhumoye, Shrimai, Yang, Yiming, and others. Advances in Neural Information Processing Systems (2024) [link]
Reflexion: Language agents with verbal reinforcement learning. Shinn, Noah, Cassano, Federico, Gopinath, Ashwin, Narasimhan, Karthik, and Yao, Shunyu. Advances in Neural Information Processing Systems (2024) [link]
Do as i can, not as i say: Grounding language in robotic affordances. Brohan, Anthony, Chebotar, Yevgen, Finn, Chelsea, Hausman, Karol, Herzog, Alexander, Ho, Daniel, Ibarz, Julian, Irpan, Alex, Jang, Eric, Julian, Ryan, and others. Conference on robot learning (2023) [link]
Peer-review-in-LLMs: Automatic Evaluation Method for LLMs in Open-environment. Ning, Kun-Peng, Yang, Shuo, Liu, Yu-Yang, Yao, Jia-Yu, Liu, Zhen-Hui, Wang, Yu, Pang, Ming, and Yuan, Li. arXiv preprint arXiv:2402.01830 (2024) [link]
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection. Yang, Shiping, Sun, Renliang, and Wan, Xiaojun. Findings of the Association for Computational Linguistics: EMNLP 2023 (2023) [link]
Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. Manakul, Potsawee, Liusie, Adian, and Gales, Mark JF. arXiv preprint arXiv:2303.08896 (2023) [link]
Improving factuality and reasoning in language models through multiagent debate. Du, Yilun, Li, Shuang, Torralba, Antonio, Tenenbaum, Joshua B, and Mordatch, Igor. arXiv preprint arXiv:2305.14325 (2023) [link]
Towards reasoning in large language models via multi-agent peer review collaboration. Xu, Zhenran, Shi, Senbao, Hu, Baotian, Yu, Jindi, Li, Dongfang, Zhang, Min, and Wu, Yuxiang. arXiv preprint arXiv:2311.08152 (2023) [link]
Lm vs lm: Detecting factual errors via cross examination. Cohen, Roi, Hamri, May, Geva, Mor, and Globerson, Amir. arXiv preprint arXiv:2305.13281 (2023) [link]
Prd: Peer rank and discussion improve large language model based evaluations. Li, Ruosen, Patel, Teerth, and Du, Xinya. arXiv preprint arXiv:2307.02762 (2023) [link]
PRE: A Peer Review Based Large Language Model Evaluator. Chu, Zhumin, Ai, Qingyao, Tu, Yiteng, Li, Haitao, and Liu, Yiqun. arXiv preprint arXiv:2401.15641 (2024) [link]
Learning from mistakes via cooperative study assistant for large language models. Wang, Danqing, and Li, Lei. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Learning from mistakes makes llm better reasoner. An, Shengnan, Ma, Zexiong, Lin, Zeqi, Zheng, Nanning, Lou, Jian-Guang, and Chen, Weizhu. arXiv preprint arXiv:2310.20689 (2023) [link]
Gaining wisdom from setbacks: Aligning large language models via mistake analysis. Chen, Kai, Wang, Chunwei, Yang, Kuo, Han, Jianhua, Hong, Lanqing, Mi, Fei, Xu, Hang, Liu, Zhengying, Huang, Wenyong, Li, Zhenguo, and others. arXiv preprint arXiv:2310.10477 (2023) [link]
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning. Tong, Yongqi, Li, Dawei, Wang, Sizhe, Wang, Yujia, Teng, Fei, and Shang, Jingbo. arXiv preprint arXiv:2403.20046 (2024) [link]

Other Domain-Specific Data

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving. Liang, Mingfu, Su, Jong-Chyi, Schulter, Samuel, Garg, Sparsh, Zhao, Shiyu, Wu, Ying, Chandraker, Manmohan. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) [link]
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization. Kim, Hyunwoo, Hessel, Jack, Jiang, Liwei, West, Peter, Lu, Ximing, Yu, Youngjae, Zhou, Pei, Bras, Ronan, Alikhani, Malihe, Kim, Gunhee, and others. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data. Xu, Canwen, Guo, Daya, Duan, Nan, and McAuley, Julian. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
PLACES: Prompting Language Models for Social Conversation Synthesis. Chen, Maximillian, Papangelis, Alexandros, Tao, Chenyang, Kim, Seokhwan, Rosenbaum, Andy, Liu, Yang, Yu, Zhou, and Hakkani-Tur, Dilek. Findings of the Association for Computational Linguistics: EACL 2023 (2023) [link]
Camel: Communicative agents for" mind" exploration of large language model society. Li, Guohao, Hammoud, Hasan, Itani, Hani, Khizbullin, Dmitrii, and Ghanem, Bernard. Advances in Neural Information Processing Systems (2024) [link]
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models. Wang, Song, Wang, Peng, Zhou, Tong, Dong, Yushun, Tan, Zhen, and Li, Jundong. arXiv preprint arXiv:2407.02408 (2024) [link]
Synth-Empathy: Towards High-Quality Synthetic Empathy Data. Liang, Hao, Sun, Linzhuang, Wei, Jingxuan, Huang, Xijie, Sun, Linkun, Yu, Bihui, He, Conghui, and Zhang, Wentao. arXiv preprint arXiv:2407.21669 (2024) [link]
AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation. Zheng, Chujie, Sabour, Sahand, Wen, Jiaxin, Zhang, Zheng, and Huang, Minlie. Findings of the Association for Computational Linguistics: ACL 2023 (2023) [link]
Weakly Supervised Data Augmentation Through Prompting for Dialogue Understanding. Chen, Maximillian, Papangelis, Alexandros, Tao, Chenyang, Rosenbaum, Andy, Kim, Seokhwan, Liu, Yang, Yu, Zhou, and Hakkani-Tur, Dilek. NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research (2022) [link]
Reflect, Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality. Zhou, Pei, Cho, Hyundong, Jandaghi, Pegah, Lee, Dong-Ho, Lin, Bill Yuchen, Pujara, Jay, and Ren, Xiang. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (2022) [link]
Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset. Renliang Sun, Mengyuan Liu, Shiping Yang, Rui Wang, Junqing He, and Jiaxing Zhang. ArXiv (2024) [link]
ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models. Xiang, Jiannan, Liu, Zhengzhong, Zhou, Yucheng, Xing, Eric, and Hu, Zhiting. Findings of the Association for Computational Linguistics: EMNLP 2022 (2022) [link]
Contextualization distillation from large language model for knowledge graph completion. Li, Dawei, Tan, Zhen, Chen, Tianlong, and Liu, Huan. arXiv preprint arXiv:2402.01729 (2024) [link]
Towards Ontology-Enhanced Representation Learning for Large Language Models. Ronzano, Francesco, and Nanavati, Jay. arXiv preprint arXiv:2405.20527 (2024) [link]
TILP}: Differentiable Learning of Temporal Logical Rules on Knowledge Graphs. Siheng Xiong, Yuan Yang, Faramarz Fekri, and James Clayton Kerce. *The Eleventh International Conference on Learning Representations * (2023) [link]
Teilp: Time prediction over knowledge graphs via logical reasoning. Xiong, Siheng, Yang, Yuan, Payani, Ali, Kerce, James C, and Fekri, Faramarz. Proceedings of the AAAI Conference on Artificial Intelligence (2024) [link]
Codekgc: Code language model for generative knowledge graph construction. Bi, Zhen, Chen, Jing, Jiang, Yinuo, Xiong, Feiyu, Guo, Wei, Chen, Huajun, and Zhang, Ningyu. ACM Transactions on Asian and Low-Resource Language Information Processing (2024) [link]
DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature. Li, Dawei, Yang, Shu, Tan, Zhen, Baik, Jae Young, Yun, Sunkwon, Lee, Joseph, Chacko, Aaron, Hou, Bojian, Duong-Tran, Duy, Ding, Ying, and others. arXiv preprint arXiv:2405.04819 (2024) [link]
Automated Construction of Theme-specific Knowledge Graphs. Ding, Linyi, Zhou, Sizhe, Xiao, Jinfeng, and Han, Jiawei. arXiv preprint arXiv:2404.19146 (2024) [link]
Large Language Models Can Learn Temporal Reasoning. Siheng Xiong, Ali Payani, Ramana Kompella and Faramarz Fekri arXiv preprint arXiv: 2401.06853 (2024) [link]
Moving from Tabular Knowledge Graph Quality Assessment to RDF Triples Leveraging ChatGPT. Tuozzo, Gabriele. No venue (2022) [link]
Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. Huang, Wenlong, Abbeel, Pieter, Pathak, Deepak, and Mordatch, Igor. International Conference on Machine Learning (2022) [link]
Do as i can, not as i say: Grounding language in robotic affordances. Brohan, Anthony, Chebotar, Yevgen, Finn, Chelsea, Hausman, Karol, Herzog, Alexander, Ho, Daniel, Ibarz, Julian, Irpan, Alex, Jang, Eric, Julian, Ryan, and others. Conference on robot learning (2023) [link]
Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning. Rana, Krishan, Haviland, Jesse, Garg, Sourav, Abou-Chakra, Jad, Reid, Ian, and Suenderhauf, Niko. 7th Annual Conference on Robot Learning (2023) [link]
Progprompt: Generating situated robot task plans using large language models. Singh, Ishika, Blukis, Valts, Mousavian, Arsalan, Goyal, Ankit, Xu, Danfei, Tremblay, Jonathan, Fox, Dieter, Thomason, Jesse, and Garg, Animesh. 2023 IEEE International Conference on Robotics and Automation (ICRA) (2023) [link]
Text2motion: From natural language instructions to feasible plans. Lin, Kevin, Agia, Christopher, Migimatsu, Toki, Pavone, Marco, and Bohg, Jeannette. Autonomous Robots (2023) [link]
GenSim: Generating Robotic Simulation Tasks via Large Language Models. Wang, Lirui, Ling, Yiyang, Yuan, Zhecheng, Shridhar, Mohit, Bao, Chen, Qin, Yuzhe, Wang, Bailin, Xu, Huazhe, and Wang, Xiaolong. The Twelfth International Conference on Learning Representations (2023) [link]
Scaling up and distilling down: Language-guided robot skill acquisition. Ha, Huy, Florence, Pete, and Song, Shuran. Conference on Robot Learning (2023) [link]
Reward Design with Language Models. Kwon, Minae, Xie, Sang Michael, Bullard, Kalesha, and Sadigh, Dorsa. The Eleventh International Conference on Learning Representations (2022) [link]
Guiding pretraining in reinforcement learning with large language models. Du, Yuqing, Watkins, Olivia, Wang, Zihan, Colas, C{'e}dric, Darrell, Trevor, Abbeel, Pieter, Gupta, Abhishek, and Andreas, Jacob. International Conference on Machine Learning (2023) [link]
Stablellava: Enhanced visual instruction tuning with synthesized image-dialogue data. Li, Yanda, Zhang, Chi, Yu, Gang, Wang, Zhibin, Fu, Bin, Lin, Guosheng, Shen, Chunhua, Chen, Ling, and Wei, Yunchao. arXiv preprint arXiv:2308.10253 (2023) [link]
Lamm: Language-assisted multi-modal instruction-tuning dataset, framework, and benchmark. Yin, Zhenfei, Wang, Jiong, Cao, Jianjian, Shi, Zhelun, Liu, Dingning, Li, Mukai, Huang, Xiaoshui, Wang, Zhiyong, Sheng, Lu, Bai, Lei, and others. Advances in Neural Information Processing Systems (2024) [link]
TOMGPT: Reliable Text-Only Training Approach for Cost-Effective Multi-modal Large Language Model. Chen, Yunkai, Wang, Qimeng, Wu, Shiwei, Gao, Yan, Xu, Tong, and Hu, Yao. ACM Transactions on Knowledge Discovery from Data (2024) [link]
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct. Luo, Run, Zhang, Haonan, Chen, Longze, Lin, Ting-En, Liu, Xiong, Wu, Yuchuan, Yang, Min, Wang, Minzheng, Zeng, Pengpeng, Gao, Lianli, and others. arXiv preprint arXiv:2409.05840 (2024) [link]
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models. Liu, Zheng, Liang, Hao, Xiong, Wentao, Yu, Qinhan, He, Conghui, Cui, Bin, and Zhang, Wentao. arXiv preprint arXiv:2407.20756 (2024) [link]
World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering. Wang, Jiacong, Wu, Bohong, Jiang, Haiyong, Xun, Zhou, Xiao, Xin, Guo, Haoyuan, and Xiao, Jun. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024) [link]
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis. Cheng, Chuanqi, Guan, Jian, Wu, Wei, and Yan, Rui. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024) [link]
Llm based generation of item-description for recommendation system. Acharya, Arkadeep, Singh, Brijraj, and Onoe, Naoyuki. Proceedings of the 17th ACM Conference on Recommender Systems (2023) [link]
PMG: Personalized Multimodal Generation with Large Language Models. Shen, Xiaoteng, Zhang, Rui, Zhao, Xiaoyan, Zhu, Jieming, and Xiao, Xi. Proceedings of the ACM on Web Conference 2024 (2024) [link]
Llmrec: Large language models with graph augmentation for recommendation. Wei, Wei, Ren, Xubin, Tang, Jiabin, Wang, Qinyong, Su, Lixin, Cheng, Suqi, Wang, Junfeng, Yin, Dawei, and Huang, Chao. Proceedings of the 17th ACM International Conference on Web Search and Data Mining (2024) [link]
Large Language Models as Evaluators for Recommendation Explanations. Zhang, Xiaoyu, Li, Yishan, Wang, Jiayin, Sun, Bowen, Ma, Weizhi, Sun, Peijie, and Zhang, Min. arXiv preprint arXiv:2406.03248 (2024) [link]
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction. Josifoski, Martin, Sakota, Marija, Peyrard, Maxime, and West, Robert. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Inpars-v2: Large language models as efficient dataset generators for information retrieval. Jeronymo, Vitor, Bonifacio, Luiz, Abonizio, Hugo, Fadaee, Marzieh, Lotufo, Roberto, Zavrel, Jakub, and Nogueira, Rodrigo. arXiv preprint arXiv:2301.01820 (2023) [link]
READ: Improving Relation Extraction from an ADversarial Perspective. Li, Dawei, Hogan, William, and Shang, Jingbo. arXiv preprint arXiv:2404.02931 (2024) [link]
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models. Ma, Mingyu Derek, Wang, Xiaoxuan, Kung, Po-Nien, Brantingham, P Jeffrey, Peng, Nanyun, and Wang, Wei. Proceedings of the AAAI Conference on Artificial Intelligence (2024) [link]
Adjudicating LLMs as PropBank Annotators. Bonn, Julia, Madabushi, Harish Tayyar, Hwang, Jena D, and Bonial, Claire. LREC-COLING 2024 (2024) [link]
Annotated dataset creation through large language models for non-english medical NLP. Frei, Johann, and Kramer, Frank. Journal of Biomedical Informatics (2023) [link]
ChatGPT as Your n-th Annotator: Experiments in Leveraging Large Language Models for Social Science Text Annotation in Slovak Language. Hamerlik, Endre, {\v{S}}uppa, Marek, Bl{\v{s}}t{'a}k, Miroslav, Kub{'\i}k, Jozef, Tak{'a}{\v{c}}, Martin, {\v{S}}imko, Mari{'a}n, and Findor, Andrej. Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers (2024) [link]
Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection. Latouche, Gaetan Lopez, Carbonneau, Marc-Andr{'e}, and Swanson, Ben. arXiv preprint arXiv:2407.11854 (2024) [link]
A Causal Explainable Guardrails for Large Language Models. Chu, Zhixuan, Wang, Yan, Li, Longfei, Wang, Zhibo, Qin, Zhan, and Ren, Kui. arXiv preprint arXiv:2405.04160 (2024) [link]
Zero-shot LLM-guided Counterfactual Generation for Text. Bhattacharjee, Amrita, Moraffah, Raha, Garland, Joshua, and Liu, Huan. arXiv preprint arXiv:2405.04793 (2024) [link]
Text classification of column headers with a controlled vocabulary: leveraging LLMs for metadata enrichment. Martorana, Margherita, Kuhn, Tobias, Stork, Lise, and van Ossenbruggen, Jacco. arXiv preprint arXiv:2403.00884 (2024) [link]
Self-Guide: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. Zhao, Chenyang, Jia, Xueying, Viswanathan, Vijay, Neubig, Graham, and Wu, Tongshuang. First Conference on Language Modeling (No Year) [link]

Assessing LLM-Generated Annotations

Evaluating LLM-Generated Annotations

The turking test: Can language models understand instructions?. Efrat, Avia, and Levy, Omer. arXiv preprint arXiv:2010.11982 (2020) [link]
Unnatural instructions: Tuning language models with (almost) no human labor. Honovich, Or, Scialom, Thomas, Levy, Omer, and Schick, Timo. arXiv preprint arXiv:2212.09689 (2022) [link]
Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks. Alizadeh, Meysam, Kubli, Ma{"e}l, Samei, Zeynab, Dehghani, Shirin, Bermeo, Juan Diego, Korobeynikova, Maria, and Gilardi, Fabrizio. arXiv preprint arXiv:2307.02179 (2023) [link]
DISCO: Distilling Counterfactuals with Large Language Models. Chen, Zeming, Gao, Qiyue, Bosselut, Antoine, Sabharwal, Ashish, and Richardson, Kyle. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023) [link]
Codegen: An open large language model for code with multi-turn program synthesis. Nijkamp, Erik, Pang, Bo, Hayashi, Hiroaki, Tu, Lifu, Wang, Huan, Zhou, Yingbo, Savarese, Silvio, and Xiong, Caiming. arXiv preprint arXiv:2203.13474 (2022) [link]
LMTurk: Few-shot learners as crowdsourcing workers in a language-model-as-a-service framework. Zhao, Mengjie, Mi, Fei, Wang, Yasheng, Li, Minglei, Jiang, Xin, Liu, Qun, and Sch{"u}tze, Hinrich. arXiv preprint arXiv:2112.07522 (2021) [link]
Large language models are zero-shot clinical information extractors. Agrawal, Monica, Hegselmann, Stefan, Lang, Hunter, Kim, Yoon, and Sontag, David. arXiv preprint arXiv:2205.12689 (2022) [link]
Annollm: Making large language models to be better crowdsourced annotators. He, Xingwei, Lin, Zhenghao, Gong, Yeyun, Jin, Alex, Zhang, Hang, Lin, Chen, Jiao, Jian, Yiu, Siu Ming, Duan, Nan, Chen, Weizhu, and others. arXiv preprint arXiv:2303.16854 (2023) [link]
Meta-rewarding language models: Self-improving alignment with llm-as-a-meta-judge. Wu, Tianhao, Yuan, Weizhe, Golovneva, Olga, Xu, Jing, Tian, Yuandong, Jiao, Jiantao, Weston, Jason, and Sukhbaatar, Sainbayar. arXiv preprint arXiv:2407.19594 (2024) [link]
Judging llm-as-a-judge with mt-bench and chatbot arena. Zheng, Lianmin, Chiang, Wei-Lin, Sheng, Ying, Zhuang, Siyuan, Wu, Zhanghao, Zhuang, Yonghao, Lin, Zi, Li, Zhuohan, Li, Dacheng, Xing, Eric, and others. Advances in Neural Information Processing Systems (2023) [link]
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge. Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng and Huan Liu arXiv preprint arXiv: 2411.16594 (2024) [link]
CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation. Li, Renhao, Tan, Minghuan, Wong, Derek F, and Yang, Min. arXiv preprint arXiv:2406.07054 (2024) [link]
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm. Liang, Yiming, Zhang, Ge, Qu, Xingwei, Zheng, Tianyu, Guo, Jiawei, Du, Xinrun, Yang, Zhenzhu, Liu, Jiaheng, Lin, Chenghua, Ma, Lei, and others. arXiv preprint arXiv:2408.08072 (2024) [link]

Filtering & Selection

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving. Liang, Mingfu, Su, Jong-Chyi, Schulter, Samuel, Garg, Sparsh, Zhao, Shiyu, Wu, Ying, Chandraker, Manmohan. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) [link]
Stablellava: Enhanced visual instruction tuning with synthesized image-dialogue data. Li, Yanda, Zhang, Chi, Yu, Gang, Wang, Zhibin, Fu, Bin, Lin, Guosheng, Shen, Chunhua, Chen, Ling, and Wei, Yunchao. arXiv preprint arXiv:2308.10253 (2023) [link]
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization. Kim, Hyunwoo, Hessel, Jack, Jiang, Liwei, West, Peter, Lu, Ximing, Yu, Youngjae, Zhou, Pei, Bras, Ronan, Alikhani, Malihe, Kim, Gunhee, and others. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Aligning Large Language Models through Synthetic Feedback. Kim, Sungdong, Bae, Sanghwan, Shin, Jamin, Kang, Soyoung, Kwak, Donghyun, Yoo, Kang, and Seo, Minjoon. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation. Zheng, Chujie, Sabour, Sahand, Wen, Jiaxin, Zhang, Zheng, and Huang, Minlie. Findings of the Association for Computational Linguistics: ACL 2023 (2023) [link]
Self-qa: Unsupervised knowledge guided language model alignment. Zhang, Xuanyu, and Yang, Qing. arXiv preprint arXiv:2305.11952 (2023) [link]
Human-instruction-free llm self-alignment with limited samples. Guo, Hongyi, Yao, Yuanshun, Shen, Wei, Wei, Jiaheng, Zhang, Xiaoying, Wang, Zhaoran, and Liu, Yang. arXiv preprint arXiv:2401.06785 (2024) [link]
Automated Construction of Theme-specific Knowledge Graphs. Ding, Linyi, Zhou, Sizhe, Xiao, Jinfeng, and Han, Jiawei. arXiv preprint arXiv:2404.19146 (2024) [link]
Large Language Models Are Reasoning Teachers. Ho, Namgyu, Schmid, Laura, and Yun, Se-Young. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023) [link]
Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks. Kang, Minki, Lee, Seanie, Baek, Jinheon, Kawaguchi, Kenji, and Hwang, Sung Ju. Advances in Neural Information Processing Systems (2024) [link]
Self-Consistency Improves Chain of Thought Reasoning in Language Models. Wang, Xuezhi, Wei, Jason, Schuurmans, Dale, Le, Quoc V, Chi, Ed H, Narang, Sharan, Chowdhery, Aakanksha, and Zhou, Denny. The Eleventh International Conference on Learning Representations (2022) [link]
Making Large Language Models Better Data Creators. Lee, Dong-Ho, Pujara, Jay, Sewak, Mohit, White, Ryen, and Jauhar, Sujay. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Reinforced self-training (rest) for language modeling. Gulcehre, Caglar, Paine, Tom Le, Srinivasan, Srivatsan, Konyushkova, Ksenia, Weerts, Lotte, Sharma, Abhishek, Siddhant, Aditya, Ahern, Alex, Wang, Miaosen, Gu, Chenjie, and others. arXiv preprint arXiv:2308.08998 (2023) [link]
Raft: Reward ranked finetuning for generative foundation model alignment. Dong, Hanze, Xiong, Wei, Goyal, Deepanshu, Pan, Rui, Diao, Shizhe, Zhang, Jipeng, Shum, Kashun, and Zhang, Tong. arXiv preprint arXiv:2304.06767 (2023) [link]
Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information. Lin, Yen-Ting, Papangelis, Alexandros, Kim, Seokhwan, Lee, Sungjin, Hazarika, Devamanyu, Namazifar, Mahdi, Jin, Di, Liu, Yang, and Hakkani-Tur, Dilek. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (2023) [link]
GenSim: Generating Robotic Simulation Tasks via Large Language Models. Wang, Lirui, Ling, Yiyang, Yuan, Zhecheng, Shridhar, Mohit, Bao, Chen, Qin, Yuzhe, Wang, Bailin, Xu, Huazhe, and Wang, Xiaolong. The Twelfth International Conference on Learning Representations (2023) [link]
DISCO: Distilling Counterfactuals with Large Language Models. Chen, Zeming, Gao, Qiyue, Bosselut, Antoine, Sabharwal, Ashish, and Richardson, Kyle. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023) [link]
SASS: Self-Alignment with Semi-Supervised Instruction Data Generation. Wang, Yue, Zhang, Haoke, Li, Juntao, Chang, Jinxiong, Zhang, Qishen, Liu, Zhongyi, Zhang, Guannan, and Zhang, Min. No venue (2023) [link]
Large Language Models Can Self-Improve. Huang, Jiaxin, Gu, Shixiang, Hou, Le, Wu, Yuexin, Wang, Xuezhi, Yu, Hongkun, and Han, Jiawei. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
West-of-N: Synthetic Preference Generation for Improved Reward Modeling. Pace, Aliz{'e}e, Mallinson, Jonathan, Malmi, Eric, Krause, Sebastian, and Severyn, Aliaksei. arXiv preprint arXiv:2401.12086 (2024) [link]
Self: Language-driven self-evolution for large language model. Lu, Jianqiao, Zhong, Wanjun, Huang, Wenyong, Wang, Yufei, Mi, Fei, Wang, Baojun, Wang, Weichao, Shang, Lifeng, and Liu, Qun. arXiv preprint arXiv:2310.00533 (2023) [link]
Inpars-v2: Large language models as efficient dataset generators for information retrieval. Jeronymo, Vitor, Bonifacio, Luiz, Abonizio, Hugo, Fadaee, Marzieh, Lotufo, Roberto, Zavrel, Jakub, and Nogueira, Rodrigo. arXiv preprint arXiv:2301.01820 (2023) [link]
DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature. Li, Dawei, Yang, Shu, Tan, Zhen, Baik, Jae Young, Yun, Sunkwon, Lee, Joseph, Chacko, Aaron, Hou, Bojian, Duong-Tran, Duy, Ding, Ying, and others. arXiv preprint arXiv:2405.04819 (2024) [link]
Optimizing Language Model's Reasoning Abilities with Weak Supervision. Tong, Yongqi, Wang, Sizhe, Li, Dawei, Wang, Yifan, Han, Simeng, Lin, Zi, Huang, Chengsong, Huang, Jiaxin, and Shang, Jingbo. arXiv preprint arXiv:2405.04086 (2024) [link]
Importance Weighting Can Help Large Language Models Self-Improve. Jiang, Chunyang, Chan, Chi-min, Xue, Wei, Liu, Qifeng, and Guo, Yike. arXiv preprint arXiv:2408.09849 (2024) [link]

LLM-Generated Annotations Utilization

Supervised Fine-Tuning

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving. Liang, Mingfu, Su, Jong-Chyi, Schulter, Samuel, Garg, Sparsh, Zhao, Shiyu, Wu, Ying, Chandraker, Manmohan. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) [link]
Large Language Models Can Self-Improve. Huang, Jiaxin, Gu, Shixiang, Hou, Le, Wu, Yuexin, Wang, Xuezhi, Yu, Hongkun, and Han, Jiawei. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Self-Instruct: Aligning Language Models with Self-Generated Instructions. Wang, Yizhong, Kordi, Yeganeh, Mishra, Swaroop, Liu, Alisa, Smith, Noah A, Khashabi, Daniel, and Hajishirzi, Hannaneh. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2023) [link]
Self: Language-driven self-evolution for large language model. Lu, Jianqiao, Zhong, Wanjun, Huang, Wenyong, Wang, Yufei, Mi, Fei, Wang, Baojun, Wang, Weichao, Shang, Lifeng, and Liu, Qun. arXiv preprint arXiv:2310.00533 (2023) [link]
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning. Yang, Zhaorui, Liu, Qian, Pang, Tianyu, Wang, Han, Feng, Haozhe, Zhu, Minfeng, and Chen, Wei. arXiv preprint arXiv:2402.13669 (2024) [link]
Self-play fine-tuning converts weak language models to strong language models. Chen, Zixiang, Deng, Yihe, Yuan, Huizhuo, Ji, Kaixuan, and Gu, Quanquan. arXiv preprint arXiv:2401.01335 (2024) [link]
Self-playing Adversarial Language Game Enhances LLM Reasoning. Cheng, Pengyu, Hu, Tianhao, Xu, Han, Zhang, Zhisong, Dai, Yong, Han, Lei, and Du, Nan. arXiv preprint arXiv:2404.10642 (2024) [link]
Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions. Wang, Hongru, Xue, Boyang, Zhou, Baohang, Zhang, Tianhua, Wang, Cunxiang, Chen, Guanhua, Wang, Huimin, and Wong, Kam-fai. arXiv preprint arXiv:2402.13514 (2024) [link]
Stanford alpaca: An instruction-following llama model. Taori, Rohan, Gulrajani, Ishaan, Zhang, Tianyi, Dubois, Yann, Li, Xuechen, Guestrin, Carlos, Liang, Percy, and Hashimoto, Tatsunori B. No venue (2023) [[link]](No link found)
Vicuna: An Open-Source Chatbot Impressing {GPT-4} with 90% ChatGPT Quality.* Chiang, Wei-Lin, Li, Zhuohan, Lin, Zi, Sheng, Ying, Wu, Zhanghao, Zhang, Hao, Zheng, Lianmin, Zhuang, Siyuan, Zhuang, Yonghao, Gonzalez, Joseph E., Stoica, Ion, and Xing, Eric P.. No venue (2023) [[link]](No link found)
Wizardlm: Empowering large language models to follow complex instructions. Xu, Can, Sun, Qingfeng, Zheng, Kai, Geng, Xiubo, Zhao, Pu, Feng, Jiazhan, Tao, Chongyang, and Jiang, Daxin. arXiv preprint arXiv:2304.12244 (2023) [link]
Generating training data with language models: Towards zero-shot language understanding. Meng, Yu, Huang, Jiaxin, Zhang, Yu, and Han, Jiawei. Advances in Neural Information Processing Systems (2022) [link]
Noise-Robust Fine-Tuning of Pretrained Language Models via External Guidance. Wang, Song, Tan, Zhen, Guo, Ruocheng, and Li, Jundong. Findings of the Association for Computational Linguistics: EMNLP 2023 (2023) [link]
PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales. Wang, PeiFeng, Chan, Aaron, Ilievski, Filip, Chen, Muhao, and Ren, Xiang. The Eleventh International Conference on Learning Representations (2022) [link]
Distilling Reasoning Capabilities into Smaller Language Models. Shridhar, Kumar, Stolfo, Alessandro, and Sachan, Mrinmaya. Findings of the Association for Computational Linguistics: ACL 2023 (2023) [link]
LogiCoT: Logical Chain-of-Thought Instruction Tuning. Liu, Hanmeng, Teng, Zhiyang, Cui, Leyang, Zhang, Chaoli, Zhou, Qiji, and Zhang, Yue. The 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks. Kang, Minki, Lee, Seanie, Baek, Jinheon, Kawaguchi, Kenji, and Hwang, Sung Ju. Advances in Neural Information Processing Systems (2024) [link]
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data. Xu, Canwen, Guo, Daya, Duan, Nan, and McAuley, Julian. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction. Josifoski, Martin, Sakota, Marija, Peyrard, Maxime, and West, Robert. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Inpars-v2: Large language models as efficient dataset generators for information retrieval. Jeronymo, Vitor, Bonifacio, Luiz, Abonizio, Hugo, Fadaee, Marzieh, Lotufo, Roberto, Zavrel, Jakub, and Nogueira, Rodrigo. arXiv preprint arXiv:2301.01820 (2023) [link]
Code alpaca: An instruction-following llama model for code generation. Chaudhary, Sahil. Code alpaca: An instruction-following llama model for code generation (2023) [[link]](No link found)
Code llama: Open foundation models for code. Roziere, Baptiste, Gehring, Jonas, Gloeckle, Fabian, Sootla, Sten, Gat, Itai, Tan, Xiaoqing Ellen, Adi, Yossi, Liu, Jingyu, Remez, Tal, Rapin, J{'e}r{'e}my, and others. arXiv preprint arXiv:2308.12950 (2023) [link]
HuatuoGPT, Towards Taming Language Model to Be a Doctor. Zhang, Hongbo, Chen, Junying, Jiang, Feng, Yu, Fei, Chen, Zhihong, Chen, Guiming, Li, Jianquan, Wu, Xiangbo, Zhiyi, Zhang, Xiao, Qingying, and others. Findings of the Association for Computational Linguistics: EMNLP 2023 (2023) [link]
Doctorglm: Fine-tuning your chinese doctor is not a herculean task. Xiong, Honglin, Wang, Sheng, Zhu, Yitao, Zhao, Zihao, Liu, Yuxiao, Huang, Linlin, Wang, Qian, and Shen, Dinggang. arXiv preprint arXiv:2304.01097 (2023) [link]
Xuanyuan 2.0: A large chinese financial chat model with hundreds of billions parameters. Zhang, Xuanyu, and Yang, Qing. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (2023) [link]
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. Luo, Haipeng, Sun, Qingfeng, Xu, Can, Zhao, Pu, Lou, Jianguang, Tao, Chongyang, Geng, Xiubo, Lin, Qingwei, Chen, Shifeng, and Zhang, Dongmei. arXiv preprint arXiv:2308.09583 (2023) [link]
Gimlet: A unified graph-text model for instruction-based molecule zero-shot learning. Zhao, Haiteng, Liu, Shengchao, Chang, Ma, Xu, Hannan, Fu, Jie, Deng, Zhihong, Kong, Lingpeng, and Liu, Qi. Advances in Neural Information Processing Systems (2024) [link]
Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMsUsing Multi-Teacher Knowledge Distillation. Yijun Tian, Yikun Han, Xiusi Chen, Wei Wang and Nitesh V Chawla arXiv preprint arXiv: 2402.04616 (2024) [link]

Alignment Tuning

Contrastive post-training large language models on data curriculum. Xu, Canwen, Rosset, Corby, Del Corro, Luciano, Mahajan, Shweti, McAuley, Julian, Neville, Jennifer, Awadallah, Ahmed Hassan, and Rao, Nikhil. arXiv preprint arXiv:2310.02263 (2023) [link]
Aligning Large Language Models through Synthetic Feedback. Kim, Sungdong, Bae, Sanghwan, Shin, Jamin, Kang, Soyoung, Kwak, Donghyun, Yoo, Kang, and Seo, Minjoon. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
West-of-N: Synthetic Preference Generation for Improved Reward Modeling. Pace, Aliz{'e}e, Mallinson, Jonathan, Malmi, Eric, Krause, Sebastian, and Severyn, Aliaksei. arXiv preprint arXiv:2401.12086 (2024) [link]
Learning Reward for Robot Skills Using Large Language Models via Self-Alignment. Zeng, Yuwei, Mu, Yao, and Shao, Lin. arXiv preprint arXiv:2405.07162 (2024) [link]
SALMON: Self-Alignment with Instructable Reward Models. Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David D. Cox, Yiming Yang, and Chuang Gan. No venue (2023) [link]
Self-rewarding language models. Yuan, Weizhe, Pang, Richard Yuanzhe, Cho, Kyunghyun, Sukhbaatar, Sainbayar, Xu, Jing, and Weston, Jason. arXiv preprint arXiv:2401.10020 (2024) [link]
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation. Zhang, Xiaoying, Peng, Baolin, Tian, Ye, Zhou, Jingyan, Jin, Lifeng, Song, Linfeng, Mi, Haitao, and Meng, Helen. arXiv preprint arXiv:2402.09267 (2024) [link]
Aligning Large Language Models by On-Policy Self-Judgment. Lee, Sangkyu, Kim, Sungdong, Yousefpour, Ashkan, Seo, Minjoon, Yoo, Kang Min, and Yu, Youngjae. arXiv preprint arXiv:2402.11253 (2024) [link]
Optimizing Language Model's Reasoning Abilities with Weak Supervision. Tong, Yongqi, Wang, Sizhe, Li, Dawei, Wang, Yifan, Han, Simeng, Lin, Zi, Huang, Chengsong, Huang, Jiaxin, and Shang, Jingbo. arXiv preprint arXiv:2405.04086 (2024) [link]
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection. Lee, Kyungjae, Hwang, Dasol, Park, Sunghyun, Jang, Youngsoo, and Lee, Moontae. arXiv preprint arXiv:2403.14238 (2024) [link]
Direct language model alignment from online ai feedback. Guo, Shangmin, Zhang, Biao, Liu, Tianlin, Liu, Tianqi, Khalman, Misha, Llinares, Felipe, Rame, Alexandre, Mesnard, Thomas, Zhao, Yao, Piot, Bilal, and others. arXiv preprint arXiv:2402.04792 (2024) [link]
Reinforced self-training (rest) for language modeling. Gulcehre, Caglar, Paine, Tom Le, Srinivasan, Srivatsan, Konyushkova, Ksenia, Weerts, Lotte, Sharma, Abhishek, Siddhant, Aditya, Ahern, Alex, Wang, Miaosen, Gu, Chenjie, and others. arXiv preprint arXiv:2308.08998 (2023) [link]
Raft: Reward ranked finetuning for generative foundation model alignment. Dong, Hanze, Xiong, Wei, Goyal, Deepanshu, Pan, Rui, Diao, Shizhe, Zhang, Jipeng, Shum, Kashun, and Zhang, Tong. arXiv preprint arXiv:2304.06767 (2023) [link]
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping. Wang, Haoyu, Ma, Guozheng, Meng, Ziqiao, Qin, Zeyu, Shen, Li, Zhang, Zhong, Wu, Bingzhe, Liu, Liu, Bian, Yatao, Xu, Tingyang, and others. arXiv preprint arXiv:2402.07610 (2024) [link]
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment. Liu, Zhili, Gou, Yunhao, Chen, Kai, Hong, Lanqing, Gao, Jiahui, Mi, Fei, Zhang, Yu, Li, Zhenguo, Jiang, Xin, Liu, Qun, and others. arXiv preprint arXiv:2405.00557 (2024) [link]
Iterative reasoning preference optimization. Pang, Richard Yuanzhe, Yuan, Weizhe, Cho, Kyunghyun, He, He, Sukhbaatar, Sainbayar, and Weston, Jason. arXiv preprint arXiv:2404.19733 (2024) [link]

Inference

Large Language Models are Human-Level Prompt Engineers. Zhou, Yongchao, Muresanu, Andrei Ioan, Han, Ziwen, Paster, Keiran, Pitis, Silviu, Chan, Harris, and Ba, Jimmy. The Eleventh International Conference on Learning Representations (2022) [link]
Auto-ICL: In-Context Learning without Human Supervision. Yang, Jinghan, Ma, Shuming, and Wei, Furu. arXiv preprint arXiv:2311.09263 (2023) [link]
Empowering Large Language Models for Textual Data Augmentation. Li, Yichuan, Ding, Kaize, Wang, Jianling, and Lee, Kyumin. No venue (No Year) [link]
Self-generated in-context learning: Leveraging auto-regressive language models as a demonstration generator. Kim, Hyuhng Joon, Cho, Hyunsoo, Kim, Junyeob, Kim, Taeuk, Yoo, Kang Min, and Lee, Sang-goo. arXiv preprint arXiv:2206.08082 (2022) [link]
Are Human-generated Demonstrations Necessary for In-context Learning?. Li, Rui, Wang, Guoyin, and Li, Jiwei. arXiv preprint arXiv:2309.14681 (2023) [link]
Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations. Chen, Wei-Lin, Wu, Cheng-Kuang, Chen, Yun-Nung, and Chen, Hsin-Hsi. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models. He, Wei, Liu, Shichun, Zhao, Jun, Ding, Yiwen, Lu, Yi, Xi, Zhiheng, Gui, Tao, Zhang, Qi, and Huang, Xuanjing. arXiv preprint arXiv:2404.00884 (2024) [link]
Rephrase and respond: Let large language models ask better questions for themselves. Deng, Yihe, Zhang, Weitong, Chen, Zixiang, and Gu, Quanquan. arXiv preprint arXiv:2311.04205 (2023) [link]
Dail: Data augmentation for in-context learning via self-paraphrase. Li, Dawei, Li, Yaxuan, Mekala, Dheeraj, Li, Shuyao, Wang, Xueqi, Hogan, William, Shang, Jingbo, and others. arXiv preprint arXiv:2311.03319 (2023) [link]
Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries. Yang, Adam, Chen, Chen, and Pitas, Konstantinos. arXiv preprint arXiv:2405.13907 (2024) [link]
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement. Xi, Zhiheng, Jin, Senjie, Zhou, Yuhao, Zheng, Rui, Gao, Songyang, Liu, Jia, Gui, Tao, Zhang, Qi, and Huang, Xuan-Jing. Findings of the Association for Computational Linguistics: EMNLP 2023 (2023) [link]
Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions. Wang, Hongru, Xue, Boyang, Zhou, Baohang, Zhang, Tianhua, Wang, Cunxiang, Chen, Guanhua, Wang, Huimin, and Wong, Kam-fai. arXiv preprint arXiv:2402.13514 (2024) [link]
Large language models are zero-shot reasoners. Kojima, Takeshi, Gu, Shixiang Shane, Reid, Machel, Matsuo, Yutaka, and Iwasawa, Yusuke. Advances in neural information processing systems (2022) [link]
Self-Consistency Improves Chain of Thought Reasoning in Language Models. Wang, Xuezhi, Wei, Jason, Schuurmans, Dale, Le, Quoc V, Chi, Ed H, Narang, Sharan, Chowdhery, Aakanksha, and Zhou, Denny. The Eleventh International Conference on Learning Representations (2022) [link]
Universal self-consistency for large language model generation. Chen, Xinyun, Aksitov, Renat, Alon, Uri, Ren, Jie, Xiao, Kefan, Yin, Pengcheng, Prakash, Sushant, Sutton, Charles, Wang, Xuezhi, and Zhou, Denny. arXiv preprint arXiv:2311.17311 (2023) [link]
Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking. Tong, Yongqi, Wang, Yifan, Li, Dawei, Wang, Sizhe, Lin, Zi, Han, Simeng, and Shang, Jingbo. arXiv preprint arXiv:2310.12342 (2023) [link]
It's Not Easy Being Wrong: Evaluating Process of Elimination Reasoning in Large Language Models. Balepur, Nishant, Palta, Shramay, and Rudinger, Rachel. arXiv preprint arXiv:2311.07532 (2023) [link]
POE: Process of Elimination for Multiple Choice Reasoning. Ma, Chenkai, and Du, Xinya. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Self-refine: Iterative refinement with self-feedback. Madaan, Aman, Tandon, Niket, Gupta, Prakhar, Hallinan, Skyler, Gao, Luyu, Wiegreffe, Sarah, Alon, Uri, Dziri, Nouha, Prabhumoye, Shrimai, Yang, Yiming, and others. Advances in Neural Information Processing Systems (2024) [link]
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning. Tong, Yongqi, Li, Dawei, Wang, Sizhe, Wang, Yujia, Teng, Fei, and Shang, Jingbo. arXiv preprint arXiv:2403.20046 (2024) [link]
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks. Chen, Wenhu, Ma, Xueguang, Wang, Xinyi, and Cohen, William W. Transactions on Machine Learning Research (2023) [link]
Graph of thoughts: Solving elaborate problems with large language models. Besta, Maciej, Blach, Nils, Kubicek, Ales, Gerstenberger, Robert, Podstawski, Michal, Gianinazzi, Lukas, Gajda, Joanna, Lehmann, Tomasz, Niewiadomski, Hubert, Nyczyk, Piotr, and others. Proceedings of the AAAI Conference on Artificial Intelligence (2024) [link]
Reasoning with Language Model is Planning with World Model. Hao, Shibo, Gu, Yi, Ma, Haodi, Hong, Joshua, Wang, Zhen, Wang, Daisy, and Hu, Zhiting. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) [link]
Tree of thoughts: Deliberate problem solving with large language models. Yao, Shunyu, Yu, Dian, Zhao, Jeffrey, Shafran, Izhak, Griffiths, Tom, Cao, Yuan, and Narasimhan, Karthik. Advances in Neural Information Processing Systems (2024) [link]

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
figure		figure
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models for Data Annotation and Synthesis: A Survey

🔔 News

Updated

Dec 2024

LLM-Based Data Annotation

Instruction & Response

Label

Rationale

Pairwise Feedback

Textual Feedback

Other Domain-Specific Data

Assessing LLM-Generated Annotations

Evaluating LLM-Generated Annotations

Filtering & Selection

LLM-Generated Annotations Utilization

Supervised Fine-Tuning

Alignment Tuning

Inference

About

Releases

Packages

Contributors 5

Zhen-Tan-dmml/LLM4Annotation

Folders and files

Latest commit

History

Repository files navigation

Large Language Models for Data Annotation and Synthesis: A Survey

🔔 News

Updated

Dec 2024

LLM-Based Data Annotation

Instruction & Response

Label

Rationale

Pairwise Feedback

Textual Feedback

Other Domain-Specific Data

Assessing LLM-Generated Annotations

Evaluating LLM-Generated Annotations

Filtering & Selection

LLM-Generated Annotations Utilization

Supervised Fine-Tuning

Alignment Tuning

Inference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Packages