diff --git a/README.md b/README.md
index 296b613b3..11fd294d2 100644
--- a/README.md
+++ b/README.md
@@ -5,8 +5,6 @@
 
 # LMFlow
 
-
-
 [![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/OptimalScale/LMFlow/blob/main/LICENSE)
 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
 [![Doc](https://img.shields.io/badge/Website-Doc-orange.svg)](https://optimalscale.github.io/LMFlow/)
@@ -21,7 +19,7 @@ An extensible, convenient, and efficient toolbox for finetuning large machine le
 
 ## Model Performance
 
-|                |  PubMedQA | MedQA-USMLE | MedMCQA |  Average |
+|                |  PubMedQA (ID) | MedQA-USMLE (OOD) | MedMCQA (ID) |  Average |
 |:---------:|:--------:|:-----------:|:-------:|:----:|
 | Human (pass)   |  60.0   |     50.0    |         |      |
 | Human (expert) |    78.0   |     87.0    |  90.0   | 85.0 |
@@ -32,12 +30,10 @@ An extensible, convenient, and efficient toolbox for finetuning large machine le
 |      LLaMA 30B |    1.8   |     43.4    |  30.3   | 25.2 |
 |   |      |             |            |    |  |
 |   Task-tuned LLaMA 7B (Full) |   **75.1**   |     44.5    |  49.9   | 56.5 |
-| Task-tuned LLaMA 30B (LoRA) |  74  |  51.3   | **50.2**|**58.5**|
+| Task-tuned LLaMA 30B (LoRA) |  74.0  |  51.3   | **50.2**|**58.5**|
 
-The LLaMA 30B (LoRA) performance is achieved with only **~16h** finetuning in a
-single 8 \* A100 server. For more performance, including instruction tuning
-results, please refer to our
-[Documentation](https://optimalscale.github.io/LMFlow/).
+The LLaMA 30B (LoRA) performance is achieved with only **~16h** finetuning on the training split of PubMedQA and MedMCQA with a single 8 \* A100 server. 
+For more performance, including instruction tuning results, please refer to our [Documentation](https://optimalscale.github.io/LMFlow/).
 
 ## Supported Pipelines