diff --git a/README.md b/README.md
index 58d74c8..4a970fa 100644
--- a/README.md
+++ b/README.md
@@ -18,16 +18,14 @@
-## 🤔 Am I in RepoBench?
+## 🔥 News
-We are always working on the next generation of RepoBench by crawling the most recent GitHub repositories! 🚀
+- *Feb 5th, 2024*: **RepoBench v1.1** (with newest code data) is now available on the 🤗 HuggingFace Hub. You can access the datasets for Python and Java using the following links:
+ - For Python: [🤗 Repobench Python V1.1](https://huggingface.co/datasets/tianyang/repobench_python_v1.1)
+ - For Java: [🤗 Repobench Java V1.1](https://huggingface.co/datasets/tianyang/repobench_java_v1.1)
+ > **For more details of RepoBench v1.1, please refer to the [data directory](./data/README.md).**
-> [!IMPORTANT]
-> We are very open to any collaborations! If you want to test your model on the data with customised cut-off date or date range, please feel free to [drop us an email](mailto:til040@ucsd.edu?subject=[RepoBench]%20Collaborations) or raise an issue. We will try our best to help you out!
-
-If you would like to have your code excluded from RepoBench, you can check if your data is in RepoBench and follow the link to **opt-out**:
-
-[🤗 Am I in RepoBech 🤗](https://huggingface.co/spaces/tianyang/in-the-repobench)
+- *Jan 16th, 2024*: RepoBench is accepted to ICLR 2024! 🎉
## 🛠️ Installation
@@ -135,10 +133,9 @@ If you use RepoBench in your research, please consider citing us:
@misc{liu2023repobench,
title={RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems},
author={Tianyang Liu and Canwen Xu and Julian McAuley},
- year={2023},
- eprint={2306.03091},
- archivePrefix={arXiv},
- primaryClass={cs.CL}
+ year={2024},
+ url={https://arxiv.org/abs/2306.03091},
+ booktitle={International Conference on Learning Representations}
}
```
diff --git a/data/README.md b/data/README.md
new file mode 100644
index 0000000..74d59fc
--- /dev/null
+++ b/data/README.md
@@ -0,0 +1,79 @@
+
+
+
+
+
+
+
+
+
+
+ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
+
+
+
+ ICLR 2024
+
+
+
+
+
+This directory hosts the datasets for subsequet versions of RepoBench. We are committed to updating RepoBench regularly, with updates scheduled **every 3 months**.
+
+## 🌇 Overview
+
+- Our primary focus is on **next-line prediction** tasks to aid in code auto-completion. If your research requires retrieval data, please don't hesitate to reach out to us for collaboration.
+- Our datasets will be hosted on 🤗 HuggingFace, making them easily accessible for everyone.
+- Each data point within our datasets is categorized based on the prompt length (number of tokens), which is determined by OpenAI's GPT-4 tokenizer using (tiktoken)[https://github.com/openai/tiktoken]. Here's a detailed table illustrating the levels we've defined:
+
+ | Level | Prompt Length (Number of Tokens) |
+ |-------|------------------------|
+ | 2k | 640 - 1,600 |
+ | 4k | 1,600 - 3,600 |
+ | 8k | 3,600 - 7,200 |
+ | 12k | 7,200 - 10,800 |
+ | 16k | 10,800 - 14,400 |
+ | 24k | 14,400 - 21,600 |
+ | 32k | 21,600 - 28,800 |
+ | 64k | 28,800 - 57,600 |
+ | 128k | 57,600 - 100,000 |
+
+## 📚 Versions
+
+### RepoBench v1.1
+
+RepoBench v1.1 includes data collected from GitHub between **October 6, 2023**, and **November 31, 2023**. To mitigate the data leakage and memorization issues, we conducted a deduplication process on the Stack v2 (coming soon) based on the file content.
+
+You can access RepoBench v1.1 at the following links:
+- For Python: [🤗 Repobench Python V1.1](https://huggingface.co/datasets/tianyang/repobench_python_v1.1)
+- For Java: [🤗 Repobench Java V1.1](https://huggingface.co/datasets/tianyang/repobench_java_v1.1)
+
+Or, you can load the data directly from the HuggingFace Hub using the following code:
+
+```python
+from datasets import load_dataset
+
+# Load the Python dataset
+python_dataset = load_dataset("tianyang/repobench_python_v1.1")
+
+# Load the Java dataset
+java_dataset = load_dataset("tianyang/repobench_java_v1.1")
+```
+
+### RepoBench v1.2
+
+*Cooming soon...*
+
+## 📝 Citation
+
+If you use RepoBench in your research, please cite the following paper:
+
+```bibtex
+@misc{liu2023repobench,
+ title={RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems},
+ author={Tianyang Liu and Canwen Xu and Julian McAuley},
+ year={2024},
+ url={https://arxiv.org/abs/2306.03091},
+ booktitle={International Conference on Learning Representations}
+}
+```
\ No newline at end of file