-
Notifications
You must be signed in to change notification settings - Fork 206
API
At this moment, all features for single image have been accessible via API. I am not planning to support batch process. However, if you really need, you can submit an issue or pull request.
You can use this API to check if this extension is working.
Example:
import requests
url = "http://localhost:7861/sam/heartbeat"
response = requests.get(url)
reply = response.json()
print(reply["msg"])
If this extension is working, you should get Success!
You can use this API to get the currently available SAM models.
Example:
import requests
url = "http://localhost:7861/sam/sam-model"
response = requests.get(url)
reply = response.json()
print(reply)
# Example Output:
# ["sam_vit_b_01ec64.pth", "sam_vit_h_4b8939.pth", "sam_vit_l_0b3195.pth"]
You will receive a list of SAM models that are available, which you can then utilize to set the sam_model_name
parameter for the predict API.
You can use this API to get masks from SAM.
Parameters:
-
sam_model_name: str = "sam_vit_h_4b8939.pth"
[Optional] SAM model name. You should manually download models before using them. Please do not change model name. See here for how to pick the model you want. Optional parameter with default assignment. -
input_image: str
[Required] base64 image in string format. -
sam_positive_points: List[List[float]] = []
[Optional] Positive point prompts in N * 2 python list. -
sam_negative_points: List[List[float]] = []
[Optional] Negative point prompts in N * 2 python list. -
dino_enabled: bool = False
[Optional] Whether to use GroundingDINO to generate bounding boxes from text to guide SAM to generate masks. -
dino_model_name: Optional[str] = "GroundingDINO_SwinT_OGC (694MB)"
[Optional] Choose one of "GroundingDINO_SwinT_OGC (694MB)" and "GroundingDINO_SwinB (938MB)" as your desired GroundingDINO model. -
dino_text_prompt: Optional[str] = None
[Optional] Text prompt for GroundingDINO to generate bounding boxes. Separate different categories with.
-
dino_box_threshold: Optional[float] = 0.3
[Optional] Threshold for selecting bounding boxes. Do not use a very high value, otherwise you may get no box. -
dino_preview_checkbox: bool = False
[Optional] Whether to preview checkbox. You can enable preview to select boxes you want if you have accessed API dino-predict -
dino_preview_boxes_selection: Optional[List[int]] = None
[Optional] Choose the boxes you want. Index start from 0.
One of point prompts and GroundingDINO text prompts must be provided to generate masks.
Returns:
-
msg
Message of the execution information. May contain some common error messages. -
blended_images
List of 3 base64 images with masks and bounding boxes. -
masks
List of 3 base64 masks generated from SAM. -
masked_images
List of 3 base64 masked images. Unmasked region will become transparent.
Example:
import base64
import requests
from PIL import Image
from io import BytesIO
def filename_to_base64(filename):
with open(filename, "rb") as fh:
return base64.b64encode(fh.read())
img_filename = "<something>.png"
url = "http://localhost:7861/sam/sam-predict"
payload = {
"input_image": filename_to_base64(img_filename).decode(),
"dino_enabled": True,
"dino_text_prompt": "the girl with blue hair",
"dino_preview_checkbox": False,
}
response = requests.post(url, json=payload)
reply = response.json()
print(reply["msg"])
grid = Image.new('RGBA', (3 * 512, 3 * 512))
def paste(img, row):
for idx, img in enumerate(img):
img_pil = Image.open(BytesIO(base64.b64decode(img))).resize((512, 512))
grid.paste(img_pil, (idx * 512, row * 512))
paste(reply["blended_images"], 0)
paste(reply["masks"], 1)
paste(reply["masked_images"], 2)
grid.show()
The output should be very similar to the demo
You can get GroundingDINO bounding boxes with this API.
Parameters:
-
input_image: str
[Required] base64 image in string format. -
dino_model_name: str = "GroundingDINO_SwinT_OGC (694MB)"
[Optional] Choose one of "GroundingDINO_SwinT_OGC (694MB)" and "GroundingDINO_SwinB (938MB)" as your desired GroundingDINO model. -
text_prompt: str
[Required] Text prompt for GroundingDINO to generate bounding boxes. Separate different categories with.
-
box_threshold: float = 0.3
[Optional] Threshold for selecting bounding boxes. Do not use a very high value, otherwise you may get no box.
Returns:
-
msg
Message of the execution information. May contain some common error messages. -
image_with_box
base64 image string with bounding boxes. Each bounding boxes are associated with an index on the left-top corner of the box.
Example:
url = "http://localhost:7861/sam/dino-predict"
payload = {
"dino_model_name": "GroundingDINO_SwinT_OGC (694MB)",
"input_image": filename_to_base64(img_filename).decode(),
"text_prompt": "the girl with red hair",
}
response = requests.post(url, json=payload)
reply = response.json()
print(reply["msg"])
grid = Image.new('RGBA', (512, 512))
paste([reply["image_with_box"]], 0)
grid.show()
You can use this API to expand the mask created by SAM.
Parameters:
-
input_image: str
[Required] base64 image in string format. -
mask: str
[Required] base64 mask image in string format. -
dilate_amount: int = 10
[Optional] Mask expansion amount from 0 to 100.
Returns:
-
blended_image
base64 image with masks and bounding boxes. -
masks
base64 mask generated from SAM. -
masked_images
base64 masked image. Unmasked region will become transparent.
Example:
url = "http://localhost:7861/sam/dilate-mask"
payload = {
"input_image": filename_to_base64(img_filename).decode(),
"mask": reply["mask"],
}
response = requests.post(url, json=payload)
reply = response.json()
grid = Image.new('RGBA', (3 * 512, 512))
paste([reply["blended_image"], reply["mask"], reply["masked_image"]], 0)
grid.show()
You can use this API to generate semantic segmentation enhanced by SAM.
Parameters:
-
Payload
-
sam_model_name: str = "sam_vit_h_4b8939.pth"
[Optional] SAM model name. -
input_image: str
[Required] base64 image in string format. -
processor: str = "seg_ofade20k"
[Optional] preprocessor for semantic segmentation, choose from one of "seg_ufade20k" (uniformer trained on ade20k, preformance really bad, can be greatly enhanced by SAM), "seg_ofade20k" (oneformer trained on ade20k, performance far better than uniformer, can be slightly improved by SAM), "seg_ofcoco" (oneformer trained on coco, similar to seg_ofade20k), "random" (for EditAnything) -
processor_res: int = 512
[Optional] preprocessor resolution, range in (64, 2048]. -
pixel_perfect: bool = False
[Optional] whether to enable pixel perfect. If enabled,target_W
andtarget_H
will be required, and the processor resolution will be overrided by the optimal value. -
resize_mode: Optional[int] = 1
[Optional] resize mode from the original shape to target shape, only effective whenpixel_perfect
is enabled. 0: just resize, 1: crop and resize, 2: resize and fill -
target_W: Optional[int] = None
[Optional, Required if pixel_perfect is True] target width if the segmentation will be used to generate a new image. -
target_H: Optional[int] = None
[Optional, Required if pixel_perfect is True] target height if the segmentation will be used to generate a new image.
-
-
autosam_conf
The meaning of each tunnable parameters are omitted. See here for explanation. Each of them are optional with default configurations from official SAM repository.points_per_side: Optional[int] = 32
points_per_batch: int = 64
pred_iou_thresh: float = 0.88
stability_score_thresh: float = 0.95
stability_score_offset: float = 1.0
box_nms_thresh: float = 0.7
crop_n_layers: int = 0
crop_nms_thresh: float = 0.7
crop_overlap_ratio: float = 512 / 1500
crop_n_points_downscale_factor: int = 1
min_mask_region_area: int = 0
Returns if the preprocessor is not random:
-
msg
Message of the execution information. May contain some common error messages. -
sem_presam
Semantic segmentation before SAM is applied. -
sem_postsam
Semantic segmentation after SAM is applied. -
blended_presam
Input image covered by the semantic segmentation before SAM is applied. -
blended_postsam
Input image covered by the semantic segmentation after SAM is applied.
Returns if the preprocessor is random:
-
msg
Message of the execution information. May contain some common error messages. -
blended_image
Input image covered by segmentation covered by random color for each region. -
random_seg
Segmentation with random color applied for each region. -
edit_anything_control
Control image for EditAnything ControlNet.
Example:
url = "http://localhost:7861/sam/controlnet-seg"
payload = {
"input_image": filename_to_base64(img_filename).decode(),
}
response = requests.post(url, json={"payload": payload, "autosam_conf": {}})
reply = response.json()
print(reply["msg"])
grid = Image.new('RGBA', (2 * 512, 2 * 512))
paste([reply["blended_presam"], reply["blended_postsam"]], 0)
paste([reply["sem_presam"], reply["sem_postsam"]], 1)
grid.show()
You can use this API to get masks generated by SAM + Semantic segmentation with catogory IDs.
Parameters:
-
Payload
Explanation omitted except forcategory
. Others have the same meaning as /sam/controlnet-segsam_model_name: str = "sam_vit_h_4b8939.pth"
processor: str = "seg_ofade20k"
processor_res: int = 512
pixel_perfect: bool = False
resize_mode: Optional[int] = 1
target_W: Optional[int] = None
target_H: Optional[int] = None
-
category: str
[Required] Category IDs separated by+
. See here for ade20k and here for coco. Note that coco jumps some numbers, so the actual ID is line_number - 21. input_image: str
-
autosam_conf
Omitted. See /sam/controlnet-seg
Returns:
-
msg
Message of the execution information. May contain some common error messages. -
blended_image
base64 image with mask. -
mask
base64 mask generated from SAM + Uniformer/Oneformer. -
masked_images
base64 masked image. Unmasked region will become transparent. -
resized_input
base64 resized input image. Since the input image will almost certainly be resized to be compatible with Oneformer/Uniformer, if you need the input image and output images for future use, you must use this resized input image, instead of the original image pre-resize.
Example:
url = "http://localhost:7861/sam/category-mask"
payload = {
"input_image": filename_to_base64(img_filename).decode(),
"category": "12",
"processor_res": 1024,
}
response = requests.post(url, json={"payload": payload, "autosam_conf": {}})
reply = response.json()
print(reply["msg"])
grid = Image.new('RGBA', (3 * 512, 512))
paste([reply["blended_image"], reply["mask"], reply["masked_image"]], 0)
grid.show()
Written by continue-revolution on 2023/04/29
. Click here to go back to the main page of this extension.