We provide checkpoints of OFA-CN, which is the Chinese version of OFA. We provide Base-size and Large-size models, including pretrained and finetuned models on image captioning and referring expression comprehension. Note that we translated the texts in the RefCOCO(-/+/g) datasets and finetuned OFA-CN on them. We plan to release the related new datasets in the near future.
Below we provide the links for downloading the Chinese OFA checkpoints.
- Pretrained checkpoint (OFA-CN-Large) (~443M parameters)
- Pretrained checkpoint (OFA-CN-Base) (~160M parameters)
- Finetuned checkpoint for MUGE Caption (Stage 1)
- Finetuned checkpoint for RefCOCO-CN
- Finetuned checkpoint for RefCOCO+-CN
- Finetuned checkpoint for RefCOCOg-CN
- Finetuned checkpoint for Chinese OCR (multitask finetuned)
- Finetuned checkpoint for MUGE Caption (Stage 1)
- Finetuned checkpoint for RefCOCO-CN
- Finetuned checkpoint for RefCOCO+-CN
- Finetuned checkpoint for RefCOCOg-CN
- Finetuned checkpoint for Chinese OCR (multitask finetuned)
Below we provide the basic information of the base-size and large-size OFA-CN.
Model | #Params | Backbone | Hidden Size | Intermediate Size | #Heads | #Enc. Layers | #Dec. Layers |
---|---|---|---|---|---|---|---|
OFABase | 160M | ResNet101 | 768 | 3072 | 12 | 6 | 6 |
OFALarge | 443M | ResNet152 | 1024 | 4096 | 16 | 12 | 12 |
Below we provide the results of OFA-CN and the baselines for comparison.
Model | BLEU@4 | ROUGE-L | CIDEr-D |
Trm | 7.33 | 51.51 | 11.00 |
M6 | 16.19 | 55.06 | 30.75 |
OFABase | 26.23 | 58.95 | 50.70 |
OFALarge | 27.32 | 59.20 | 53.51 |
Model | RefCOCO(val/testA/testB) | RefCOCO+(val/testA/testB) | RefCOCOg(val/test-u) |
OFABase(random-init) | 30.13/35.07/25.03 | 17.89/20.90/15.83 | 20.30/20.45 |
OFABase | 82.18/86.07/76.68 | 69.38/77.26/60.14 | 73.57/72.53 |
OFALarge | 82.84/86.54/76.50 | 71.30/78.56/61.85 | 71.96/71.30 |