AutoSurvey: Large Language Models Can Automatically Write Surveys
Yidong Wang1,2∗, Qi Guo2,3∗, Wenjin Yao2, Hongbo Zhang1, Xin Zhang4, Zhen Wu3, Meishan Zhang4, Xinyu Dai3, Min Zhang4, Qingsong Wen5, Wei Ye2†, Shikun Zhang2†, Yue Zhang1†
1Westlake University, 2Peking University, 3Nanjing University, 4Harbin Institute of Technology, Shenzhen, 5Squirrel AI
AutoSurvey is a speedy and well-organized framework for automating the creation of comprehensive literature surveys.
Extensive experimental results across different survey lengths (8k, 16k, 32k, and 64k tokens) demon- strate that AutoSurvey consistently achieves high citation and content quality scores
- Python 3.10.x
- Required Python packages listed in
requirements.txt
-
Clone the repository:
git clone https://github.com/AutoSurveys/AutoSurvey.git cd AutoSurvey
-
Install the required packages:
pip install -r requirements.txt
-
Download the database: (Here we provide a database containing 530,000 arXiv paper abstracts and all papers are under the CS category. You can contact us to obtain the database containing the full content of the papers. ) https://1drv.ms/u/c/8761b6d10f143944/EaqWZ4_YMLJIjGsEB_qtoHsBoExJ8bdppyBc1uxgijfZBw?e=2EIzti
unzip database.zip -d ./database/
Here is an example command to generate survey on the topic "LLMs for education":
python main.py --topic "LLMs for education"
--gpu 0
--saving_path ./output/
--model gpt-4o-2024-05-13
--section_num 7
--subsection_len 700
--rag_num 60
--outline_reference_num 1500
--db_path ./database
--embedding_model nomic-ai/nomic-embed-text-v1
--api_url https://api.openai.com/v1/chat/completions
--api_key sk-xxxxxx
The generated content will be saved in the ./output/
directory.
--gpu
: Specify the GPU to use.--saving_path
: Directory to save the output survey.--model
: Model to use.--topic
: Topic to generate content for.--section_num
: Number of sections in the outline.--subsection_len
: Length of each subsection.--rag_num
: Number of references to use for RAG.--outline_reference_num
: Number of references for outline generation.--db_path
: Directory of the database.--embedding_model
: Embedding model for retrieval.--api_key
: API key for the model.--api_url
: url for API request.
Here is an example command to evaluate the generated survey on the topic "LLMs for education":
python evaluation.py --topic "LLMs for education"
--gpu 0
--saving_path ./output/
--model gpt-4o-2024-05-13
--db_path ./database
--embedding_model nomic-ai/nomic-embed-text-v1
--api_url https://api.openai.com/v1/chat/completions
--api_key sk-xxxxxx
Make sure the generated survey is in the ./output/
directory
The evaluation result will be saved in the ./output/
directory.
--gpu
: Specify the GPU to use (default: '0').--saving_path
: Directory to save the evaluation results (default: './output/').--model
: Model for evaluation.--topic
: Topic of generated survey.--db_path
: Directory of the database.--embedding_model
: Embedding model for retrieval.--api_key
: API key for the model.--api_url
: url for API request.
Please cite us if you find this project helpful for your project/paper:
@inproceedings{
2024autosurvey,
title={AutoSurvey: Large Language Models Can Automatically Write Surveys},
author = {Wang, Yidong and Guo, Qi and Yao, Wenjin and Zhang, Hongbo and Zhang, Xin and Wu, Zhen and Zhang, Meishan and Dai, Xinyu and Zhang, Min and Wen, Qingsong and Ye, Wei and Zhang, Shikun and Zhang, Yue},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024}
}
Contributions are welcome! Please open an issue to discuss what you would like to change.
This project is licensed under the MIT License.