Image Segmentation in Foundation Model Era: A Survey

Tianfei Zhou , Wang Xia , Fei Zhang , Boyu Chang , Wenguan Wang , Ye Yuan , Ender Konukoglu , Daniel Cremers

This repository complies a collection of resources on image segmentation in foundation model era, and will be continuously updated to track developments in the field. Please feel free to submit a pull request if you find any work missing.

1. Introduction

Image segmentation is a long-standing challenge in computer vision, studied continuously over several decades, as evidenced by seminal algorithms such as N-Cut, FCN, and MaskFormer. With the advent of foundation models (FMs), contemporary segmentation methodologies have embarked on a new epoch by either adapting FMs (e.g., CLIP, Stable Diffusion, DINO) for image segmentation or developing dedicated segmentation foundation models (e.g., SAM, SAM2). These approaches not only deliver superior segmentation performance, but also herald newfound segmentation capabilities previously unseen in deep learning context. However, current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions associated with these advancements. This survey seeks to fill this gap by providing a thorough review of cutting-edge research centered around FM-driven image segmentation. We investigate two basic lines of research (as shown in the following figure) – generic image segmentation (i.e., semantic segmentation, instance segmentation, panoptic segmentation), and promptable image segmentation (i.e., interactive segmentation, referring segmentation, few-shot segmentation) – by delineating their respective task settings, background concepts, and key challenges. Furthermore, we provide insights into the emergence of segmentation knowledge from FMs like CLIP, Stable Diffusion, and DINO. An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current research efforts. Subsequently, we engage in a discussion of open issues and potential avenues for future research.

2. Segmentation Knowledge Emerges From FMs

Given the emergency capabilities of LLMs, a natural question arises: Do segmentation properties emerge from FMs? The answer is positive, even for FMs not explicitly designed for segmentation, such as CLIP, DINO and Diffusion Models. This also unlocks a new frontier in image segmentation, i.e., acquiring segmentation without any training. The following figure illustrates how to approach this and shows some examples:

3. Foundation Model based GIS

4. Foundation Model based PIS

Citation

If you find our survey and repository useful for your research, please consider citing our paper:

@article{zhou2024SegFMSurvey
    title={Image Segmentation in Foundation Model Era: A Survey},
    author={Zhou, Tianfei and Xia, Wang and Zhang, Fei and Chang, Boyu and Wang, Wenguan and Yuan, Ye and Konukoglu, Ender and Cremers, Daniel},
    journal={arXiv preprint arXiv:2408.12957},
    year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
2-Segmentation Emerge.md		2-Segmentation Emerge.md
3-GIS.md		3-GIS.md
4-PIS.md		4-PIS.md
README.md		README.md
segmentation emerge.PNG		segmentation emerge.PNG
tasks.png		tasks.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Segmentation in Foundation Model Era: A Survey

1. Introduction

2. Segmentation Knowledge Emerges From FMs

3. Foundation Model based GIS

4. Foundation Model based PIS

Citation

About

Releases

Packages

Contributors 3

stanley-313/ImageSegFM-Survey

Folders and files

Latest commit

History

Repository files navigation

Image Segmentation in Foundation Model Era: A Survey

1. Introduction

2. Segmentation Knowledge Emerges From FMs

3. Foundation Model based GIS

4. Foundation Model based PIS

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages