From 45dad465b7fe65bc9c9038522b35ad3207fcb7dc Mon Sep 17 00:00:00 2001 From: Yufei Wang Date: Thu, 27 Jul 2023 14:15:40 +0800 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3e972c2..b39f59d 100644 --- a/README.md +++ b/README.md @@ -159,11 +159,11 @@ We hope this repository can help researchers and practitioners to get a better u - Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies [[Paper]](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00370/100680/Did-Aristotle-Use-a-Laptop-A-Question-Answering) - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models [[Paper]](https://openreview.net/forum?id=_VjQlMeSB_J) - Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them [[Paper]](https://arxiv.org/abs/2210.09261) -##### Coding - Program Synthesis with Large Language Models [[Paper]](https://arxiv.org/abs/2108.07732) - DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation [[Paper]](https://arxiv.org/abs/2211.11501) - Evaluating Large Language Models Trained on Code [[Paper]](https://arxiv.org/abs/2107.03374) - Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation [[Paper]](https://arxiv.org/abs/2305.01210) + ##### Safety - Safety Assessment of Chinese Large Language Models [[Paper]](https://arxiv.org/abs/2304.10436) - CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility [[Paper]](https://arxiv.org/abs/2307.09705)