Skip to content

Commit

Permalink
update: finetune models and logs
Browse files Browse the repository at this point in the history
  • Loading branch information
wondervictor committed Mar 21, 2024
1 parent e03714d commit 3264b61
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 22 deletions.
22 changes: 0 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,28 +117,6 @@ We provide the pre-training logs of `YOLO-World-v2`. Due to the unexpected error
| YOLO-World-v2-M | [Part-1](https://drive.google.com/file/d/1E6vYSS8kBipGc8oQnsjAfeUAx8I9yOX7/view?usp=drive_link), [Part-2](https://drive.google.com/file/d/1fbM7vt2tgSeB8o_7tUDofWvpPNSViNj5/view?usp=drive_link) |
| YOLO-World-v2-X | [Final part](https://drive.google.com/file/d/1aEUA_EPQbXOrpxHTQYB6ieGXudb1PLpd/view?usp=drive_link) |

### YOLO-World-Seg: Open-Vocabulary Instance Segmentation

We fine-tune YOLO-World on LVIS (`LVIS-Base`) with mask annotations for open-vocabulary (zero-shot) instance segmentation.

We provide two fine-tuning strategies YOLO-World towards open-vocabulary instance segmentation:

* fine-tuning `all modules`: leads to better LVIS segmentation accuracy but affects the zero-shot performance.

* fine-tuning the `segmentation head`: maintains the zero-shot performanc but lowers LVIS segmentation accuracy.

| Model | Fine-tuning Data | Fine-tuning Modules| AP<sup>mask</su> | AP<sub>r</sub> | AP<sub>c</sub> | AP<sub>f</sub> | Weights |
| :---- | :--------------- | :----------------: | :--------------: | :------------: | :------------: | :------------: | :-----: |
| [YOLO-World-Seg-M](./configs/segmentation/yolo_world_seg_m_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis.py) | `LVIS-Base` | `all modules` | 25.9 | 13.4 | 24.9 | 32.6 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_m_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis-ca465825.pth) |
| [YOLO-World-Seg-L](./configs/segmentation/yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis.py) | `LVIS-Base` | `all modules` | 28.7 | 15.0 | 28.3 | 35.2| [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis-8c58c916.pth) |
| [YOLO-World-Seg-M](./configs/segmentation/yolo_seg_world_m_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis.py) | `LVIS-Base` | `seg head` | 16.7 | 12.6 | 14.6 | 20.8 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_m_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis-7bca59a7.pth) |
| [YOLO-World-Seg-L](./configs/segmentation/yolo_seg_world_l_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis.py) | `LVIS-Base` | `seg head` | 19.1 | 14.2 | 17.2 | 23.5 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis-5a642d30.pth) |

**NOTE:**
1. The mask AP are evaluated on the LVIS `val 1.0`.
2. All models are fine-tuned for 80 epochs on `LVIS-Base` (866 categories, `common + frequent`).
3. The YOLO-World-Seg with only `seg head` fine-tuned maintains the original zero-shot detection capability and segments objects.

## Getting started

### 1. Installation
Expand Down
23 changes: 23 additions & 0 deletions configs/segmentation/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,24 @@
## Fine-tuning YOLO-World for Instance Segmentation


### Models

We fine-tune YOLO-World on LVIS (`LVIS-Base`) with mask annotations for open-vocabulary (zero-shot) instance segmentation.

We provide two fine-tuning strategies YOLO-World towards open-vocabulary instance segmentation:

* fine-tuning `all modules`: leads to better LVIS segmentation accuracy but affects the zero-shot performance.

* fine-tuning the `segmentation head`: maintains the zero-shot performanc but lowers LVIS segmentation accuracy.

| Model | Fine-tuning Data | Fine-tuning Modules| AP<sup>mask</su> | AP<sub>r</sub> | AP<sub>c</sub> | AP<sub>f</sub> | Weights |
| :---- | :--------------- | :----------------: | :--------------: | :------------: | :------------: | :------------: | :-----: |
| [YOLO-World-Seg-M](./configs/segmentation/yolo_world_seg_m_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis.py) | `LVIS-Base` | `all modules` | 25.9 | 13.4 | 24.9 | 32.6 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_m_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis-ca465825.pth) |
| [YOLO-World-Seg-L](./configs/segmentation/yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis.py) | `LVIS-Base` | `all modules` | 28.7 | 15.0 | 28.3 | 35.2| [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis-8c58c916.pth) |
| [YOLO-World-Seg-M](./configs/segmentation/yolo_seg_world_m_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis.py) | `LVIS-Base` | `seg head` | 16.7 | 12.6 | 14.6 | 20.8 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_m_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis-7bca59a7.pth) |
| [YOLO-World-Seg-L](./configs/segmentation/yolo_seg_world_l_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis.py) | `LVIS-Base` | `seg head` | 19.1 | 14.2 | 17.2 | 23.5 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_seghead_finetune_lvis-5a642d30.pth) |

**NOTE:**
1. The mask AP are evaluated on the LVIS `val 1.0`.
2. All models are fine-tuned for 80 epochs on `LVIS-Base` (866 categories, `common + frequent`).
3. The YOLO-World-Seg with only `seg head` fine-tuned maintains the original zero-shot detection capability and segments objects.

0 comments on commit 3264b61

Please sign in to comment.