add tutorial for Img-Seq type input

z-x-yang · Apr 26, 2023 · 34e51f3 · 34e51f3
1 parent 847ff9a
commit 34e51f3
Show file tree

Hide file tree

Showing 9 changed files with 39 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -8,9 +8,10 @@
 **Segment and Track Anything** is an open-source project that focuses on the segmentation and tracking of any objects in videos, utilizing both automatic and interactive methods. The primary algorithms utilized include the [**SAM** (Segment Anything Models)](https://github.com/facebookresearch/segment-anything) for automatic/interactive key-frame segmentation and the [**DeAOT** (Decoupling features in Associating Objects with Transformers)](https://github.com/yoxu515/aot-benchmark) (NeurIPS2022) for efficient multi-object tracking and propagation. The SAM-Track pipeline enables dynamic and automatic detection and segmentation of new objects by SAM, while DeAOT is responsible for tracking all identified objects.
 
 ## :loudspeaker:New Features
+- [2023/4/26] **Image-Sequence input**: The WebUI now has a new feature that allows for input of image sequences, which can be used to test video segmentation datasets. Get started with the [tutorial](./tutorial/tutorial%20for%20Image-Sequence%20input.md) for Image-Sequence input. 
 - [2023/4/25] **Online Demo:** You can easily use SAMTrack in [Colab](https://colab.research.google.com/drive/1R10N70AJaslzADFqb-a5OihYkllWEVxB?usp=sharing) for visual tracking tasks.
 
-- [2023/4/23] **Interactive WebUI:** We have introduced a new WebUI that allows interactive user segmentation through strokes and clicks. Feel free to explore and have fun with the [tutorial](./tutorial/1.0-Version.md)!
+- [2023/4/23] **Interactive WebUI:** We have introduced a new WebUI that allows interactive user segmentation through strokes and clicks. Feel free to explore and have fun with the [tutorial](./tutorial/tutorial%20for%20WebUI-1.0-Version.md)!
     - [2023/4/24] **Tutorial V1.0:** Check out our new video tutorials!
       - YouTube-Link: [Tutorial for Interactively modify single-object mask for first frame of video](https://www.youtube.com/watch?v=DF0iFSsX8KY)、[Tutorial for Interactively add object by click](https://www.youtube.com/watch?v=UJvKPng9_DA)、[Tutorial for Interactively add object by stroke](https://www.youtube.com/watch?v=m1oFavjIaCM).
       - Bilibili Video Link:[Tutorial for Interactively modify single-object mask for first frame of video](https://www.bilibili.com/video/BV1tM4115791/?spm_id_from=333.999.0.0)、[Tutorial for Interactively add object by click](https://www.bilibili.com/video/BV1Qs4y1A7d1/)、[Tutorial for Interactively add object by stroke](https://www.bilibili.com/video/BV1Lm4y117J4/?spm_id_from=333.999.0.0).
@@ -87,15 +88,15 @@ python app.py
 ```
 Users can upload the video directly on the UI and use SegTracker to automatically/interactively track objects within that video. We use a video of a man playing basketball as an example.
 
-![Interactive WebUI](./assets/interactive_weiui.jpg)
+![Interactive WebUI](./assets/interactive_webui.jpg)
 
 SegTracker-Parameters:
  - **aot_model**: used to select which version of DeAOT/AOT to use for tracking and propagation.
  - **sam_gap**: used to control how often SAM is used to add newly appearing objects at specified frame intervals. Increase to decrease the frequency of discovering new targets, but significantly improve speed of inference.
  - **points_per_side**: used to control the number of points per side used for generating masks by sampling a grid over the image. Increasing the size enhances the ability to detect small objects, but larger targets may be segmented into finer granularity.
  - **max_obj_num**: used to limit the maximum number of objects that SAM-Track can detect and track. A larger number of objects necessitates a greater utilization of memory, with approximately 16GB of memory capable of processing a maximum of 255 objects.
 
-Usage: To see the details, please refer to the [tutorial for 1.0-Version WebUI](./tutorial/1.0-Version.md).
+Usage: To see the details, please refer to the [tutorial for 1.0-Version WebUI](./tutorial/tutorial%20for%20WebUI-1.0-Version.md).
 
 ### :full_moon_with_face:Credits
 Licenses for borrowed code can be found in `licenses.md` file.

diff --git a/app.py b/app.py
@@ -272,7 +272,7 @@ def seg_track_app():
                     with gr.Row():
                         input_img_seq = gr.File(label='Input Image-Seq').style(height=550)
                         with gr.Column(scale=0.25):
-                            unzip_button = gr.Button(value="unzip")
+                            extract_button = gr.Button(value="extract")
                             fps = gr.Slider(label='fps', minimum=5, maximum=50, value=30, step=1)
 
                 input_first_frame = gr.Image(label='Segment result of first frame',interactive=True).style(height=550)
@@ -330,7 +330,7 @@ def seg_track_app():
                                                 )
 
                 with gr.Row():
-                    with gr.Tab(label="SegTracker Args", scale=0.5):
+                    with gr.Tab(label="SegTracker Args"):
                         with gr.Row():
                             # args for tracking in video do segment-everthing
                             with gr.Column(scale=0.5):
@@ -414,7 +414,7 @@ def seg_track_app():
             ]
         )
 
-        unzip_button.click(
+        extract_button.click(
             fn=get_meta_from_img_seq,
             inputs=[
                 input_img_seq

diff --git a/assets/interactive_weiui.jpg → assets/interactive_webui.jpg b/assets/interactive_weiui.jpg → assets/interactive_webui.jpg
diff --git a/tutorial/img/select_fps.jpg b/tutorial/img/select_fps.jpg
diff --git a/tutorial/img/switch2ImgSeq.jpg b/tutorial/img/switch2ImgSeq.jpg
diff --git a/tutorial/img/upload_Image_seq.jpg b/tutorial/img/upload_Image_seq.jpg
diff --git a/tutorial/img/use_exa4ImgSeq.jpg b/tutorial/img/use_exa4ImgSeq.jpg
diff --git a/tutorial/tutorial for Image-Sequence input.md b/tutorial/tutorial for Image-Sequence input.md
@@ -0,0 +1,31 @@
+# Tutorial for Image-Sequence input
+
+## Zip the Image-Sequence as input for the WebUI.
+**The structure of test-data-seq.zip must be like this. Please confirm that the image names are in ascending order.**
+```
+- test-data-seq
+    - 0.png
+    - 1.png
+    - 2.png
+    - 3.png
+    ....
+    - x.png
+```
+
+## Use WebUI get test Image-Sequence data
+### 1. Switch to the `Image-Seq type input` tab.
+
+ <p align="center"><img src="./img/switch2ImgSeq.jpg" width = "600" height = "300" alt="switch2ImgSeq"/> </p>
+
+### 2. Upload the test dataset or use the provided examples directly.
+- Once the test dataset has finished uploading, the WebUI will automatically extract the first frame and display it in the `Segment result of first frame` component.
+- If you use the provided examples, you may need to manually extract the results by clicking the `extract` button.
+- Below are examples of how to upload an Image-sequence data.
+
+<p align="center"><img src="./img/upload_Image_seq.jpg" width = "600" height = "300"> <img src="./img/use_exa4ImgSeq.jpg" width = "600"></p>
+
+### 3. Select fps for the output video
+
+<p align="center"><img src="./img/select_fps.jpg" width = "600" height = "300"> </p>
+
+### 4. You can follow the [tutorial for WebUI-1.0-Version](./tutorial%20for%20WebUI-1.0-Version.md) to obtain your result.
diff --git a/tutorial/1.0-Version.md → tutorial/tutorial for WebUI-1.0-Version.md b/tutorial/1.0-Version.md → tutorial/tutorial for WebUI-1.0-Version.md
@@ -1,4 +1,4 @@
-# 1.0-Version WebUI tutorial
+# Tutorial for WebUI 1.0 Version
 
 ## Note: 
 - We recommend reinitializing SegTracker by clicking the `Reset button` after processing each video to avoid encountering bugs.