Update README.md

OFA-Sys · Jul 15, 2022 · 0566212 · 0566212
1 parent 3b4952a
commit 0566212
Showing 1 changed file with 44 additions and 29 deletions.
diff --git a/README.md b/README.md
@@ -159,7 +159,50 @@ pip install -r requirements.txt
 See [datasets.md](datasets.md) and [checkpoints.md](checkpoints.md).
 <br></br>
 
-# Pretraining
+# Training & Inference
+Below we provide methods for training and inference on different tasks. We provide both pretrained OFA-Large and OFA-Base in [checkpoints.md](checkpoints.md). The scripts mentioned in this section are prepared for OFA-Large. For reproducing the downstreaming results of OFA-Base, we have also provided the corresponding finetuning and inference scripts for OFA-Base in the `run_scripts/` folder.
+
+We recommend that your workspace directory should be organized like this: 
+```
+OFA/
+├── checkpoints/
+│   ├── ofa_base.pt
+│   ├── ofa_large.pt
+│   ├── caption_large_best_clean.pt
+│   └── ...
+├── criterions/
+├── data/
+├── dataset/
+│   ├── caption_data/
+│   ├── gigaword_data/
+│   └── ...
+├── fairseq/
+├── models/
+├── run_scripts/
+├── tasks/
+├── train.py
+├── trainer.py
+└── utils/
+```
+
+
+## Image Processing
+To ensure the efficiency of processing data, we did not store images with small files, but instead we encode them to base64 strings.
+Transforming image files to base64 strings is simple. Run the following code:
+```python
+from PIL import Image
+from io import BytesIO
+import base64
+
+img = Image.open(file_name) # path to file
+img_buffer = BytesIO()
+img.save(img_buffer, format=img.format)
+byte_data = img_buffer.getvalue()
+base64_str = base64.b64encode(byte_data) # bytes
+base64_str = base64_str.decode("utf-8") # str
+```
+
+## Pretraining
 Below we provide methods for pretraining OFA.
 
 <details>
@@ -194,34 +237,6 @@ INFO: Loaded checkpoint ../../checkpoints/ofa_large.pt
 </pre>
 </details>
 
-<br></br>
-
-# Finetuning & Inference
-Below we provide methods for finetuning and inference on different downstream tasks. We provide both pretrained OFA-Large and OFA-Base in [checkpoints.md](checkpoints.md). The scripts mentioned in this section are prepared for OFA-Large. For reproducing the downstreaming results of OFA-Base, we have also provided the corresponding finetuning and inference scripts for OFA-Base in the `run_scripts/` folder.
-
-We recommend that your workspace directory should be organized like this: 
-```
-OFA/
-├── checkpoints/
-│   ├── ofa_base.pt
-│   ├── ofa_large.pt
-│   ├── caption_large_best_clean.pt
-│   └── ...
-├── criterions/
-├── data/
-├── dataset/
-│   ├── caption_data/
-│   ├── gigaword_data/
-│   └── ...
-├── fairseq/
-├── models/
-├── run_scripts/
-├── tasks/
-├── train.py
-├── trainer.py
-└── utils/
-```
-
 ## Image Captioning
 We provide procedures to reproduce our results of image captioning on our paper below.
 <details>