Segment Anything Fast example (#2802)

agunapal · web-flow · commit f3a2267e7c34 · 2023-12-02T05:12:03.000Z
* Segment Anything Fast example

* Segment Anything Fast example

* Changes to make model inference faster

* addressed review comments

* code cleanup

* review comments

* added missing instruction

* added python 3.10 dependency
diff --git a/examples/large_models/segment_anything_fast/README.md b/examples/large_models/segment_anything_fast/README.md
@@ -0,0 +1,77 @@
+
+## Segment Anything Fast
+
+[Segment Anything Fast](https://github.com/pytorch-labs/segment-anything-fast) is the optimized version of [Segment Anything](https://github.com/facebookresearch/segment-anything) with 8x performance improvements compared to the original implementation. The improvements were achieved using native PyTorch.
+
+Improvement in speed in achieved using
+- Torch.compile: A compiler for PyTorch models
+- GPU quantization: Accelerate models with reduced precision operations
+- Scaled Dot Product Attention (SDPA): Memory efficient attention implementations
+- Semi-Structured (2:4) Sparsity: A GPU optimized sparse memory format
+- Nested Tensor: Batch together non-uniformly sized data into a single Tensor, such as images of different sizes.
+- Custom operators with Triton: Write GPU operations using Triton Python DSL and easily integrate it into PyTorch’s various components with custom operator registration.
+
+Details on how this is achieved can be found in this [blog](https://pytorch.org/blog/accelerating-generative-ai/)
+
+#### Pre-requisites
+
+Needs python 3.10
+
+`cd` to the example folder `examples/large_models/segment_anything_fast`
+
+Install `Segment Anything Fast` by running
+```
+chmod +x install_segment_anything_fast.sh
+source install_segment_anything_fast.sh
+```
+Segment Anything Fast needs the nightly version of PyTorch. Hence the script is uninstalling PyTorch, its domain libraries and installing the nightly version of PyTorch.
+
+
+### Step 1: Download the weights
+
+```
+wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
+```
+
+If you are not using A100 for inference, turn off the A100 specific optimization using
+```
+export SEGMENT_ANYTHING_FAST_USE_FLASH_4=0
+```
+
+Depending on the available GPU memory, you need to edit the value of `process_batch_size` in `model-config.yaml`
+`process_batch_size` is the batch size for the decoding step. Use a smaller value for lower memory footprint.
+Higher value will result in faster inference. The following values were tested.
+
+Example:
+  - For `A10G` : `process_batch_size=8`
+  - For `A100` : `process_batch_size=16`
+
+
+### Step 2: Generate mar or tgz file
+
+```
+torch-model-archiver --model-name sam-fast --version 1.0 --handler custom_handler.py --config-file model-config.yaml --archive-format tgz
+```
+
+### Step 3: Add the tgz file to model store
+
+```
+mkdir model_store
+mv sam-fast.tar.gz model_store
+```
+
+### Step 4: Start torchserve
+
+```
+torchserve --start --ncs --model-store model_store --models sam-fast.tar.gz
+```
+
+### Step 5: Run inference
+
+```
+python inference.py
+```
+
+results in
+
+![kitten_mask_sam_fast](./kitten_mask_fast.png)
diff --git a/examples/large_models/segment_anything_fast/custom_handler.py b/examples/large_models/segment_anything_fast/custom_handler.py
@@ -0,0 +1,85 @@
+import base64
+import io
+import logging
+import pickle
+
+import cv2
+import numpy as np
+import torch
+from PIL import Image
+from segment_anything_fast import SamAutomaticMaskGenerator, sam_model_fast_registry
+
+from ts.handler_utils.timer import timed
+from ts.torch_handler.base_handler import BaseHandler
+
+logger = logging.getLogger(__name__)
+
+
+class SegmentAnythingFastHandler(BaseHandler):
+    def __init__(self):
+        super().__init__()
+        self.mask_generator = None
+        self.initialized = False
+
+    def initialize(self, ctx):
+        properties = ctx.system_properties
+        self.device = "cpu"
+        if torch.cuda.is_available() and properties.get("gpu_id") is not None:
+            self.map_location = "cuda"
+            self.device = torch.device(
+                self.map_location + ":" + str(properties.get("gpu_id"))
+            )
+
+        model_type = ctx.model_yaml_config["handler"]["model_type"]
+        sam_checkpoint = ctx.model_yaml_config["handler"]["sam_checkpoint"]
+        process_batch_size = ctx.model_yaml_config["handler"]["process_batch_size"]
+
+        self.model = sam_model_fast_registry[model_type](checkpoint=sam_checkpoint)
+        self.model.to(self.device)
+
+        self.mask_generator = SamAutomaticMaskGenerator(
+            self.model, process_batch_size=process_batch_size, output_mode="coco_rle"
+        )
+
+        logger.info(
+            f"Model weights {sam_checkpoint} for {model_type} loaded successfully with process batch size {process_batch_size}"
+        )
+        self.initialized = True
+
+    @timed
+    def preprocess(self, data):
+        images = []
+        for row in data:
+            image = row.get("data") or row.get("body")
+            if isinstance(image, str):
+                # if the image is a string of bytesarray.
+                image = base64.b64decode(image)
+
+            # If the image is sent as bytesarray
+            if isinstance(image, (bytearray, bytes)):
+                image = Image.open(io.BytesIO(image))
+            else:
+                # if the image is a list
+                image = torch.FloatTensor(image)
+
+            image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
+            images.append(image)
+
+        return images
+
+    @timed
+    def inference(self, data):
+        assert (
+            len(data) == 1
+        ), "SAM AutoMaticMaskGenerator currently supports batch size of 1"
+        return self.mask_generator.generate(data[0])
+
+    @timed
+    def postprocess(self, data):
+        # Serialize the output using Pickle
+        serialized_data = pickle.dumps(data)
+
+        # Encode the serialized data as Base64
+        base64_encoded_data = base64.b64encode(serialized_data).decode("utf-8")
+
+        return [base64_encoded_data]
diff --git a/examples/large_models/segment_anything_fast/inference.py b/examples/large_models/segment_anything_fast/inference.py
@@ -0,0 +1,57 @@
+import base64
+import pickle
+
+import cv2
+import matplotlib.pyplot as plt
+import numpy as np
+import requests
+from pycocotools import mask as coco_mask
+
+url = "http://localhost:8080/predictions/sam-fast"
+image_path = "./kitten.jpg"
+
+
+def show_anns(anns):
+    if len(anns) == 0:
+        return
+    for i in range(len(anns)):
+        anns[i]["segmentation"] = coco_mask.decode(anns[i]["segmentation"])
+    sorted_anns = sorted(anns, key=(lambda x: x["area"]), reverse=True)
+    ax = plt.gca()
+    ax.set_autoscale_on(False)
+
+    img = np.ones(
+        (
+            sorted_anns[0]["segmentation"].shape[0],
+            sorted_anns[0]["segmentation"].shape[1],
+            4,
+        )
+    )
+    img[:, :, 3] = 0
+    for ann in sorted_anns:
+        m = ann["segmentation"].astype(bool)
+        color_mask = np.concatenate([np.random.random(3), [0.35]])
+        img[m] = color_mask
+    ax.imshow(img)
+
+
+# Send Inference request to TorchServe
+file = {"body": open(image_path, "rb")}
+res = requests.post(url, files=file)
+
+# Decode the Base64 encoded data (if needed)
+decoded_data = base64.b64decode(res.text)
+
+# Deserialize the data using Pickle
+masks = pickle.loads(decoded_data)
+
+
+# Plot the segmentation mask on the image
+image = cv2.imread(image_path)
+image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+plt.figure(figsize=(image.shape[1] / 100.0, image.shape[0] / 100.0), dpi=100)
+plt.imshow(image)
+show_anns(masks)
+plt.axis("off")
+plt.tight_layout()
+plt.savefig("kitten_mask_fast.png", format="png")
diff --git a/examples/large_models/segment_anything_fast/install_segment_anything_fast.sh b/examples/large_models/segment_anything_fast/install_segment_anything_fast.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+# Uninstall torchtext, torchdata, torch, torchvision, and torchaudio
+pip uninstall torchtext torchdata torch torchvision torchaudio -y
+
+# Install nightly PyTorch and torchvision from the specified index URL
+pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 --ignore-installed
+
+# Optional: Display the installed PyTorch and torchvision versions
+python -c "import torch; print('PyTorch version:', torch.__version__)"
+python -c "import torchvision; print('torchvision version:', torchvision.__version__)"
+
+echo "PyTorch and torchvision updated successfully!"
+
+# Install the segment-anything-fast package from GitHub
+pip install git+https://github.com/pytorch-labs/segment-anything-fast.git
+
+echo "Segment Anything Fast installed successfully!"
+
+echo "Installing other dependencies"
+pip install opencv-python matplotlib pycocotools
diff --git a/examples/large_models/segment_anything_fast/kitten.jpg b/examples/large_models/segment_anything_fast/kitten.jpg
diff --git a/examples/large_models/segment_anything_fast/kitten_mask_fast.png b/examples/large_models/segment_anything_fast/kitten_mask_fast.png
diff --git a/examples/large_models/segment_anything_fast/model-config.yaml b/examples/large_models/segment_anything_fast/model-config.yaml
@@ -0,0 +1,6 @@
+responseTimeout: 300
+handler:
+    profile: true
+    model_type: "vit_h"
+    sam_checkpoint: "/home/ubuntu/serve/examples/large_models/segment_anything_fast/sam_vit_h_4b8939.pth"
+    process_batch_size: 8
diff --git a/ts_scripts/spellcheck_conf/wordlist.txt b/ts_scripts/spellcheck_conf/wordlist.txt
@@ -1138,3 +1138,8 @@ FlashAttention
 GenAI
 prem
 CachingMetric
+DSL
+SDPA
+sam
+zlib
+

-Original file line number
+Diff line change
 GenAI
 prem
 CachingMetric
 +DSL
 +SDPA
 +sam
 +zlib
++