Skip to content

Commit b80b964

Browse files
feat(nodes): add expand_mask_with_fade to better handle canvas compositing needs
Previously we used erode/dilate and a Gaussian blur to expand and fade the edges of Canvas masks. The implementation a number of problems: - Erode/dilate kernel sizes were not calculated correctly, and extra iterations were run to compensate. The result is the blur size, which should have been pixels, was very inaccurate and unreliable. - What we want is to add a "soft bleed" - like a drop shadow with no offset - starting from the edge of the mask, extending out by however many pixels. But Gaussian blur does not do this. The blurred area starts _inside_ the mask and extends outside it. So it kinda blurs inwards and outwards. We compensated for this by expanding the mask. - Using a Gaussian blur can cause banding artifacts. Gaussian blur doesn't have a "size" or "radius" parameter in the sense that you think it should. It's a convolution matrix and there are _no non-zero values in the result_. This means that, far away from the mask, once compositing completes, we have some values that are very close to zero but not quite zero. These values are quantized by HTML Canvas, resulting in banding artifacts where you'd expect the blur to have faded to 0% alpha. At least, that is my understanding of why the banding artifacts occur. The new node uses a better strategy to expand the mask and add the fade out effect: - Calculate the distance from each white pixel to the nearest black pixel. - Normalize this distance by dividing by the fade size in px, then clip the values to 0 - 1. The result represents the distance of each white pixel to its nearest black pixel as a percentage of the fade size. At this point, it is a linear distribution. - Create a polynomial to describe the fade's intensity so that we can have a smooth transition from the masked region (black) to unmasked (white). There are some magic numbers here, deterined experimentally. - Evaluate the polynomial over the normalized distances, so we now have a matrix representing the fade intensity for every pixel - Convert this matrix back to uint8 and apply it to the mask This works soooo much better than the previous method. Not only does it fix the banding issues, but when we enable "output only generated regions", we get a much smaller image. Will add images to the PR to clarify.
1 parent 4ad2000 commit b80b964

File tree

1 file changed

+67
-0
lines changed

1 file changed

+67
-0
lines changed

invokeai/app/invocations/image.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1089,6 +1089,73 @@ def invoke(self, context: InvocationContext) -> ImageOutput:
10891089
return ImageOutput.build(image_dto)
10901090

10911091

1092+
@invocation(
1093+
"expand_mask_with_fade", title="Expand Mask with Fade", tags=["image", "mask"], category="image", version="1.0.0"
1094+
)
1095+
class ExpandMaskWithFadeInvocation(BaseInvocation, WithMetadata, WithBoard):
1096+
"""Expands a mask with a fade effect. The mask uses black to indicate areas to keep from the generated image and white for areas to discard.
1097+
The mask is thresholded to create a binary mask, and then a distance transform is applied to create a fade effect.
1098+
The fade size is specified in pixels, and the mask is expanded by that amount. The result is a mask with a smooth transition from black to white.
1099+
"""
1100+
1101+
mask: ImageField = InputField(description="The mask to expand")
1102+
threshold: int = InputField(default=0, ge=0, le=255, description="The threshold for the binary mask (0-255)")
1103+
fade_size_px: int = InputField(default=32, ge=0, description="The size of the fade in pixels")
1104+
1105+
def invoke(self, context: InvocationContext) -> ImageOutput:
1106+
pil_mask = context.images.get_pil(self.mask.image_name, mode="L")
1107+
1108+
np_mask = numpy.array(pil_mask)
1109+
1110+
# Threshold the mask to create a binary mask - 0 for black, 255 for white
1111+
# If we don't threshold we can get some weird artifacts
1112+
np_mask = numpy.where(np_mask > self.threshold, 255, 0).astype(numpy.uint8)
1113+
1114+
# Create a mask for the black region (1 where black, 0 otherwise)
1115+
black_mask = (np_mask == 0).astype(numpy.uint8)
1116+
1117+
# Invert the black region
1118+
bg_mask = 1 - black_mask
1119+
1120+
# Create a distance transform of the inverted mask
1121+
dist = cv2.distanceTransform(bg_mask, cv2.DIST_L2, 5)
1122+
1123+
# Normalize distances so that pixels <fade_size_px become a linear gradient (0 to 1)
1124+
d_norm = numpy.clip(dist / self.fade_size_px, 0, 1)
1125+
1126+
# Control points: x values (normalized distance) and corresponding fade pct y values.
1127+
1128+
# There are some magic numbers here that are used to create a smooth transition:
1129+
# - The first point is at 0% of fade size from edge of mask (meaning the edge of the mask), and is 0% fade (black)
1130+
# - The second point is 1px from the edge of the mask and also has 0% fade, effectively expanding the mask
1131+
# by 1px. This fixes an issue where artifacts can occur at the edge of the mask
1132+
# - The third point is at 20% of the fade size from the edge of the mask and has 20% fade
1133+
# - The fourth point is at 80% of the fade size from the edge of the mask and has 90% fade
1134+
# - The last point is at 100% of the fade size from the edge of the mask and has 100% fade (white)
1135+
1136+
# x values: 0 = mask edge, 1 = fade_size_px from edge
1137+
x_control = numpy.array([0.0, 1.0 / self.fade_size_px, 0.2, 0.8, 1.0])
1138+
# y values: 0 = black, 1 = white
1139+
y_control = numpy.array([0.0, 0.0, 0.2, 0.9, 1.0])
1140+
1141+
# Fit a cubic polynomial that smoothly passes through the control points
1142+
coeffs = numpy.polyfit(x_control, y_control, 3)
1143+
poly = numpy.poly1d(coeffs)
1144+
1145+
# Evaluate and clip the smooth mapping
1146+
feather = numpy.clip(poly(d_norm), 0, 1)
1147+
1148+
# Build final image.
1149+
np_result = numpy.where(black_mask == 1, 0, (feather * 255).astype(numpy.uint8))
1150+
1151+
# Convert back to PIL, grayscale
1152+
pil_result = Image.fromarray(np_result.astype(numpy.uint8), mode="L")
1153+
1154+
image_dto = context.images.save(image=pil_result, image_category=ImageCategory.MASK)
1155+
1156+
return ImageOutput.build(image_dto)
1157+
1158+
10921159
@invocation(
10931160
"apply_mask_to_image",
10941161
title="Apply Mask to Image",

0 commit comments

Comments
 (0)