diff --git a/README.md b/README.md index 3e972c2..b39f59d 100644 --- a/README.md +++ b/README.md @@ -159,11 +159,11 @@ We hope this repository can help researchers and practitioners to get a better u - Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies [[Paper]](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00370/100680/Did-Aristotle-Use-a-Laptop-A-Question-Answering) - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models [[Paper]](https://openreview.net/forum?id=_VjQlMeSB_J) - Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them [[Paper]](https://arxiv.org/abs/2210.09261) -##### Coding - Program Synthesis with Large Language Models [[Paper]](https://arxiv.org/abs/2108.07732) - DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation [[Paper]](https://arxiv.org/abs/2211.11501) - Evaluating Large Language Models Trained on Code [[Paper]](https://arxiv.org/abs/2107.03374) - Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation [[Paper]](https://arxiv.org/abs/2305.01210) + ##### Safety - Safety Assessment of Chinese Large Language Models [[Paper]](https://arxiv.org/abs/2304.10436) - CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility [[Paper]](https://arxiv.org/abs/2307.09705)