Edge Drawing - a Canny alternative #318

aalmada · 2023-03-29T18:01:17Z

aalmada
Mar 29, 2023

Canny is a good edge detector but it only provides good edges after fine tuning its parameters. Images with different contrasts require different parameters.

Long time ago I found this paper that proposes an alternative that outputs great edges on most images without requiring tuning: https://www.semanticscholar.org/paper/Edge-Drawing%3A-A-Heuristic-Approach-to-Robust-Edge-Topal-Akinlar/9c8f2dc3bbd0e7e28f4a72dcdb4d77b52f478740

I found this implementation using python: https://github.com/shaojunluo/EDLinePython

I think it would be great if you could integrate it. It could make edge module a lot easier to use and potentially to automate.

djbielejeski · 2023-05-01T20:01:18Z

djbielejeski
May 1, 2023

Here's a quick and dirty implementation for the extension, still needs to have a model trained for it.

scripts/processor.py


model_ed_line = None

def ed_line(img, res=512, thr_a=25, thr_b=10, **kwargs):
    img = resize_image(HWC3(img), res)
    global model_ed_line
    if model_ed_line is None:
        from annotator.ed_line import apply_ed_line
        model_ed_line = apply_ed_line
    result = model_ed_line(img, gradient_threshold=thr_a, anchor_threshold=thr_b)
    return result, True

scripts/global_state.py

cn_preprocessor_modules = {
    "none": lambda x, *args, **kwargs: (x, True),
    "canny": canny,
    "ed_line": ed_line,
...
}

annotator/ed_line/init.py

import numpy as np
import cv2

HORIZONTAL = 1
VERTICAL = -1
LEFT = -1
RIGHT = 1
UP = -1
DOWN = 1
max_gap = 4

# default parameters
EDParam_default = {
    'ksize': 5,
    'sigma': 1.0,
    'gradientThreshold': 36,
    'anchorThreshold': 8,
    'scanIntervals': 1
}


class EdgeDrawing:
    # initiation
    def __init__(self, EDParam=EDParam_default):
        #	EDLineDetector constructor function
        # set parameters for line segment detection
        self.ksize_ = EDParam['ksize']
        self.sigma_ = EDParam['sigma']
        self.gradientThreshold_ = EDParam['gradientThreshold']
        self.anchorThreshold_ = EDParam['anchorThreshold']
        self.scanIntervals_ = EDParam['scanIntervals']
        # dimension of image
        self.MAX_X = 0
        self.MAX_Y = 0
        self.G_ = np.array([])
        self.D_ = np.array([])
        self.E_ = np.array([])

    # search algorithm for EdgeDrawing
    def GoUp_(self, x, y):
        segment = []  # array to record edge segment
        direct_next = None  # search direction of left side similart to right, up and down
        while x > 0 and self.G_[x, y] > 0 and not self.E_[x, y]:
            next_y = [max(0, y - 1), y, min(self.MAX_Y - 1, y + 1)]  # search in a valid area
            segment.append((x, y))  # extend line segments
            if self.D_[x, y] == VERTICAL:
                self.E_[x, y] = True  # mark as edge
                y_last = y  # record parent pixel
                x, y = x - 1, next_y[np.argmax(self.G_[x - 1, next_y])]  # walk to next pixel with max gradient
            else:
                direct_next = y - y_last  # change direction to continue search
                break  # stop and proceed to next search
        return segment, direct_next

    def GoDown_(self, x, y):
        segment = []
        direct_next = None
        while x < self.MAX_X - 1 and self.G_[x, y] > 0 and not self.E_[x, y]:
            next_y = [max(0, y - 1), y, min(self.MAX_Y - 1, y + 1)]
            segment.append((x, y))
            if self.D_[x, y] == VERTICAL:
                self.E_[x, y] = True
                y_last = y
                x, y = x + 1, next_y[np.argmax(self.G_[x + 1, next_y])]
            else:
                direct_next = y - y_last
                break
        return segment, direct_next

    def GoRight_(self, x, y):
        segment = []
        direct_next = None
        while y < self.MAX_Y - 1 and self.G_[x, y] > 0 and not self.E_[x, y]:
            next_x = [max(0, x - 1), x, min(self.MAX_X - 1, x + 1)]
            segment.append((x, y))
            if self.D_[x, y] == HORIZONTAL:
                self.E_[x, y] = True
                x_last = x
                x, y = next_x[np.argmax(self.G_[next_x, y + 1])], y + 1
            else:
                direct_next = x - x_last
                break
        return segment, direct_next

    def GoLeft_(self, x, y):
        segment = []
        direct_next = None
        while y > 0 and self.G_[x, y] > 0 and not self.E_[x, y]:
            next_x = [max(0, x - 1), x, min(self.MAX_X - 1, x + 1)]
            segment.append((x, y))
            if self.D_[x, y] == HORIZONTAL:
                self.E_[x, y] = True
                x_last = x
                x, y = next_x[np.argmax(self.G_[next_x, y - 1])], y - 1
            else:
                direct_next = x - x_last
                break
        return segment, direct_next

    # walk down until reach the end
    def SmartWalk_(self, x, y, direct_next):
        segment = [(x, y)]
        while direct_next is not None:
            x, y = segment[-1][0], segment[-1][1]
            # if the last point of chain is horizontal, explore horizontally
            if self.D_[x, y] == HORIZONTAL:
                # get segment sequence
                if direct_next == LEFT:
                    s, direct_next = self.GoLeft_(x, y)
                elif direct_next == RIGHT:
                    s, direct_next = self.GoRight_(x, y)
                else:
                    break
            #                    if self.G_[x,y+1]>self.G_[x,y-1]:
            #                        s, direct_next = self.GoRight_(x,y)
            #                    else:
            #                        s, direct_next = self.GoLeft_(x,y)
            elif self.D_[x, y] == VERTICAL:  # explore vertically
                if direct_next == UP:
                    s, direct_next = self.GoUp_(x, y)
                elif direct_next == DOWN:
                    s, direct_next = self.GoDown_(x, y)
                else:
                    break
            #                    if self.G_[x-1,y]>self.G_[x+1,y]:
            #                        s, direct_next = self.GoUp_(x,y)
            #                    else:
            #                        s, direct_next = self.GoDown_(x,y)
            else:  # if the next pixel is invalid
                break
            if len(s) > 1:
                segment.extend(s[1:])
        return segment

    # find list of anchors
    def FindAnchors_(self, image):
        # detect the anchor
        anchor_list = []
        for i in range(1, image.shape[0] - 1, self.scanIntervals_):
            for j in range(1, image.shape[1] - 1, self.scanIntervals_):
                if self.D_[i, j] == HORIZONTAL:  # HORIZONTAL EDGl compare up & down
                    if self.G_[i, j] - self.G_[i - 1, j] >= self.anchorThreshold_ and self.G_[i, j] - self.G_[i + 1, j] >= self.anchorThreshold_:
                        anchor_list.append((i, j))
                elif self.D_[i, j] == VERTICAL:  # VERTICAL EDGE. Compare with left & right.
                    if self.G_[i, j] - self.G_[i, j - 1] >= self.anchorThreshold_ and self.G_[i, j] - self.G_[i, j + 1] >= self.anchorThreshold_:
                        anchor_list.append((i, j))
        return anchor_list

    # merge edges
    def MergeEdges_(self, edges):
        # connect and merge the edges inplace
        merged = True
        while merged:  # if last iteration perform merged
            p1 = 0  # pivot for first edge
            merged = False  # assume not going to merge
            # iterate over edges to merge
            while p1 < len(edges):
                p2 = p1 + 1
                while p2 < len(edges):
                    # mark start and end point of 2 segments
                    start_1, end_1 = edges[p1][0], edges[p1][-1]
                    start_2, end_2 = edges[p2][0], edges[p2][-1]
                    # direction of two vectors
                    v_1 = (end_1[0] - start_1[0], end_1[1] - start_1[1])
                    v_2 = (end_2[0] - start_2[0], end_2[1] - start_2[1])
                    # if they aligned in the same direction, compare with head-end
                    if np.dot(v_1, v_2) >= 0:
                        if abs(end_1[0] - start_2[0]) + abs(end_1[1] - start_2[1]) < max_gap:
                            # merge end-head
                            edges[p1] = edges[p1] + edges.pop(p2)
                            merged = True
                        elif abs(start_1[0] - end_2[0]) + abs(start_1[1] - end_2[1]) < max_gap:
                            # merge end-head
                            edges[p1] = edges.pop(p2) + edges[p1]
                            merged = True
                        else:
                            p2 += 1  # manually poceed to next segment
                    else:
                        if abs(start_1[0] - start_2[0]) + abs(start_1[1] - start_2[1]) < max_gap:
                            # merge head-head
                            edges[p1] = edges[p1][::-1] + edges.pop(p2)
                            merged = True
                        elif abs(end_1[0] - end_2[0]) + abs(end_1[1] - end_2[1]) < max_gap:
                            # merge end-end
                            edges[p1] = edges.pop(p2) + edges[p1][::-1]
                            merged = True
                        else:
                            p2 += 1  # manually poceed to next segment
                p1 += 1  # next segment
        return

    # edge drawing algorithm
    def EdgeDrawing(self, image, smoothed=False):
        # validation check for image
        if len(image.shape) > 2:
            raise ('Use only 1 channel or grayscale image')
            return None
        # set up dimension
        self.MAX_X = image.shape[0]
        self.MAX_Y = image.shape[1]
        # if not smoothed then smooth it
        if not smoothed:  # input image hasn't been smoothed.
            img = cv2.GaussianBlur(image, (self.ksize_, self.ksize_), self.sigma_)
        else:
            img = image.copy()
        # compute dx,dy imagegradient
        dxImg_ = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=1)
        dyImg_ = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=1)

        # Compute gradient map and direction map
        # self.G_ = np.sqrt(dxImg_*dxImg_ + dyImg_*dyImg_)
        self.G_ = np.abs(dxImg_) + np.abs(dyImg_)
        self.G_[self.G_ < self.gradientThreshold_] = 0
        # If true, then it is horizontal edge
        self.D_ = -np.sign(np.abs(dxImg_) - np.abs(dyImg_))
        self.D_[self.G_ < self.gradientThreshold_] = 0

        # cv2.imwrite('Gradient.bmp',255*(self.G_>0).astype(int))
        # find anchor list
        anchor_list = self.FindAnchors_(image)

        edges = []
        # initiate edgemap
        self.E_ = np.zeros(self.G_.shape, dtype=bool)
        # first round edrawing, get fragment segments
        for anchor in anchor_list:
            if not self.E_[anchor]:  # if not mark as edges
                # walk right or down
                segment_1 = self.SmartWalk_(anchor[0], anchor[1], 1)
                # reset anchor point
                self.E_[anchor] = False
                # walk left or up
                segment_2 = self.SmartWalk_(anchor[0], anchor[1], -1)
                # concat two segments
                if len(segment_1[::-1] + segment_2) > 0:
                    edges.append(segment_1[::-1] + segment_2[1:])
        # merge the edges with same direction
        # self.MergeEdges_(edges)
        edge_map = 255 * self.E_.astype(np.uint8)
        return edges, edge_map


def apply_ed_line(img, gradient_threshold=25, anchor_threshold=10):
    # parameters for Edge Drawing
    EDParam = {
        'ksize': 3,  # gaussian Smooth filter size if smoothed = False
        'sigma': 1,  # gaussian smooth sigma ify smoothed = False
        'gradientThreshold': 25,  # threshold on gradient image
        'anchorThreshold': 10,  # threshold to determine the anchor
        'scanIntervals': 1,  # scan interval, the smaller, the more detail
    }
    ED = EdgeDrawing(EDParam)

    # convert to gray-scale image
    input_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # edge Drawing
    edges, edges_map = ED.EdgeDrawing(input_image, smoothed=False)

    return np.array(edges_map)

2 replies

geroldmeisinger Sep 20, 2023

Why would we need a specialized model anyway? Wouldn't the canny model pick up on edges from edgedrawing as well?

geroldmeisinger Sep 22, 2023

see here #318 (comment)

geroldmeisinger · 2023-09-19T07:12:17Z

geroldmeisinger
Sep 19, 2023

0 replies

geroldmeisinger · 2023-09-19T13:20:53Z

geroldmeisinger
Sep 19, 2023

see here https://huggingface.co/GeroldMeisinger/control-edgedrawing

Trained for 40000 steps on images converted with https://github.com/shaojunluo/EDLinePython using smoothed = False and default settings:

{ 'ksize'            :  5
, 'sigma'            :  1.0
, 'gradientThreshold': 36
, 'anchorThreshold'  :  8
, 'scanIntervals'    :  1
}

Controlnet somewhat picks up on it but results are not good so far:

--proportion_empty_prompts=0.5 may be too excessive for 40000 steps
Use smoothed = True next time, maybe control net doesn't pick up on single pixels
Find better parameter spread instead of default values, most images are very sparse

~~You could really help me out by trying different parameters to find a good spread for randomization (which ksize, sigma, gradientThreshold etc. makes sense)!~~ (obsolete)

0 replies

geroldmeisinger · 2023-09-20T09:02:40Z

geroldmeisinger
Sep 20, 2023

Update: the following issues are obselete with the opencv version, see next posts

Lenna test (obsolete)

There seems to be something off with the gradienThreshold, see shaojunluo/EDLinePython#4

Edgedrawing default-settings:

Edgedrawing from paper:

Edgedrawing no-denoising:

Canny edge from Automatic1111 (using default settings: low=100 high=200)

edge drawing algorithm (obsolete)

The edge drawing algorithm works in 3 steps:

it's asking if the image has been de-noised already (smoothed: bool), if not, apply a gaussian blur with kernel size ksize and sigma on it. de-noising is important, otherwise you get ALOT of edges.
converts it to a 1-bit black-and-white image ("hi-pass filter") by calculating gradients in x and y direction and applying a gradientThreshold. whatever remains is used for edge detection.
calculating edge directions and anchor points according to a certain "detail ratio" and then connecting the anchor points. this gives us the edges.

and then returns a list of edge-objects and the resulting image.
these are all the parameters we can influence.

Parameter permutations (obsolete)

following is a short python snippet which you can add to EdgeDrawing.py to apply different parameters on a image list to see the effects. Comments are extracted from the paper:

(add import os on top)

filenames = ["000000001", "000000006", "000000014", "000000024", "000000029"]

# The goal of this step is to reduce the effect of noise in the image by blurring each pixel with its neighboring pixels. We achieve this by a standard 5x5 Gaussian kernel with σ = 1.
isSmoothed = False
ksizes = [3, 5, 7, 9, 11] # GaussianBlur kernel size, only if smoothed=False
sigmas = [0.6, 0.8, 1.0, 1.2, 1.4] # GaussianBlur sigma, only if smoothed=False

gradientThresholds = [24, 30, 36, 42, 48] # Figure 1 [High-pass filtering residue of Lena image] shows the edge areas corresponding to our test image for a threshold value of 36. White edge areas correspond to pixels for which Gx + Gy >= 36. Black pixels are suppressed as non-edges.

# We define what we call a "detail ratio" that determines how many edge anchor points are selected. The more anchor points you have to start the linking process, the more detail you will have in your final edgemap. Thus, if you are only interested in obtaining the major (long) edges in the image, you can specify a big detail ratio, e.g., a value bigger than 10. But if you want more details in your edge map, you can specify a small detail ratio such as 1, 2, 3, 4, etc.
# For a "detail ratio" value of "k", we scan every kth row or column and mark anchor points only if they fall in these rows or columns. Thus, two consecutive anchor points along the same edge will be at least "k" pixels apart from each other.
anchorThresholds = [4, 6, 8, 10, 12] 
scanIntervals=[1, 2, 3, 4, 5] 

def with_param(filename, param1, x, param2=None, y=None):
    params = EDParam_default.copy()
    params[param1] = x
    if param2 is not None:
        params[param2] = y
    ed = EdgeDrawing(params)
    image = cv2.imread(filename + ".webp", cv2.IMREAD_GRAYSCALE)
    edges, edge_map = ed.EdgeDrawing(image, smoothed=isSmoothed)
    
    if param2 is None:
        cv2.imwrite(filename + "/" + filename + "_" + param1 + "=" + str(x) + ".png", edge_map, [cv2.IMWRITE_PNG_BILEVEL, 1])
    else:
        cv2.imwrite(filename + "/" + filename + "_" + param1 + "=" + str(x) + "_" + param2 + "=" + str(y) + ".png", edge_map, [cv2.IMWRITE_PNG_BILEVEL, 1])

for filename in filenames:
    if not os.path.exists(filename):
        os.makedirs(filename)
        
    image = cv2.imread(filename + ".webp", cv2.IMREAD_GRAYSCALE)
    ed = EdgeDrawing()
    edges, edge_map = ed.EdgeDrawing(image, smoothed=False)
    cv2.imwrite(filename + "/" + filename + ".png", edge_map, [cv2.IMWRITE_PNG_BILEVEL, 1])
    
    if not isSmoothed:
        for x in ksizes:
            for y in sigmas:
                with_param(filename, "ksize", x, "sigma", y)
    for x in gradientThresholds: with_param(filename, "gradientThreshold", x)
    for x in anchorThresholds:
        for y in scanIntervals: with_param(filename, "anchorThreshold", x, "scanIntervals", y)

0 replies

geroldmeisinger · 2023-09-20T11:20:58Z

geroldmeisinger
Sep 20, 2023

https://huggingface.co/GeroldMeisinger/control-edgedrawing

40k steps, default settings, smoothed=True (=> noisy), no drops:
prompt: "a detailed high-quality professional image of an eagle flying over the mountains"

0 replies

geroldmeisinger · 2023-09-20T13:08:15Z

geroldmeisinger
Sep 20, 2023

here is the cpp implementation of edgedrawing by the original author https://github.com/CihanTopal/ED_Lib . interestingly there is a "parameter free" version (which would be nice) and a "color" version (which would also be nice).
apparently edgedrawing is already implemented in opencv https://docs.opencv.org/4.8.0/d1/d1c/classcv_1_1ximgproc_1_1EdgeDrawing.html (which would be very nice).

EdgeDrawing Parameter-Free

pip uninstall opencv-python opencv-contrib-python
pip install opencv-contrib-python

import cv2

ed = cv2.ximgproc.createEdgeDrawing()
params = cv2.ximgproc.EdgeDrawing.Params()
params.PFmode=True
ed.setParams(params)
image = cv2.imread("lenna.tif", cv2.IMREAD_GRAYSCALE)
edges = ed.detectEdges(image)
edge_map = ed.getEdgeImage(edges)
cv2.imwrite("lenna_edges.png", edge_map, [cv2.IMWRITE_PNG_BILEVEL, 1])

that was easy...

default:

parameter-free:

color version is not available unfortunately. but algorithm is fast. training starts again. see you tomorrow...

0 replies

geroldmeisinger · 2023-09-22T10:58:42Z

geroldmeisinger
Sep 22, 2023

Update https://huggingface.co/GeroldMeisinger/control-edgedrawing -> control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-45000.safetensors

You can find all images, generation detail and comparison with canny in here: control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-45000.zip
subjective evaluation
Lenna: mine is better
Bird: same
Lion: same
Dog2: canny is better
canny also seems to produce better no-prompt results

eagle: "a detailed high-quality professional image of an eagle flying over the mountains"

lenna: "a detailed high-quality professional photo of swedish woman standing in front of a mirror, dark brown hair, white hat with purple feather"

bird: "bird"

lion: "lion"

dog2: "a cute dog"

45000 steps with fp16 so far. resuming for another 45000 with flipped images.

the results look promising. what do you think?

1 reply

djbielejeski Sep 22, 2023

Looks super promising to me too, probably needs some more training but I am not an expert at CN training

geroldmeisinger · 2023-09-22T12:55:47Z

geroldmeisinger
Sep 22, 2023

Questions about training

I work on the laion2b-en-aesthetics6.5 dataset with 180k images which uses alt-tags as captions. 1 epoch takes 20h on my RTX 3060 12GB. To increase training quality we should apply certain transformations but because of the sheer amount of images this has to be done automatically of course.

Would it make sense to run BLIP or BLIP2 over the image dataset to get captions more closly to how Stable Diffusion works? But SD was trained on this very alt-captions.
Would it make sense run 1 epoch on the original captions and 1 epoch on the BLIP-captions?
Is it okay to train on the same dataset as SD or better to use a new one?
I resized all images to 512 on shorter side and then center-cropped. According to Unsatisfactory human pose training result #224 (comment): "SD can be trained with any resolution as long as the number can be mod by 64." => should i leave images in original size instead and just resize to closest mod64 (some are 10000x10000)? should i resize them a "practical" size instead (say short side 1024) and then mod64? or are 512x512 center-cropped images okay?
How many epochs are good?
Llama2 says I should shuffle the training dataset before running another epoch (which begs the question if I should have shuffled the whole dataset initially... based on which metrics exactly? color, luminance, content...?)
Prompt dropping? 0% 10% 50%?
increase gradient-accumulation? see here: https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md
Why are we using 3-channel RGB instead of 1-channel luminance? would we gain more speed? less RAM?
Is seed=0 a problem?

more experiments:

Now that I used no prompt dropping for the first 90k steps, does it make sense to use prompt dropping now or should I have started much earlier?
Run second epoch with 50% prompt dropping?
Fine-tune from Canny(!)?

0 replies

geroldmeisinger · 2023-09-23T18:21:50Z

geroldmeisinger
Sep 23, 2023

UPDATE

90000 steps fp16 (45000 on original, 45000 on left-right flipped, no drops)
https://huggingface.co/GeroldMeisinger/control-edgedrawing -> control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-90000.safetensors
evaluation.zip is on huggingface. I think the results are surprisingly good! guessmode still doesn't work. humans don't look so good, but then again, they don't seem to look good in canny either (maybe my settings are bad). canny doesn't seem to work so well on edpf images, so the edpf-model may be justified.

Resuming with epoch 2 and --proportion_empty_prompts=0.5

So I started using python for SD now and wrote a small script which generates the evaluation images. A few fixes compared to the previous (manual) generations:

the canny images were reduce to next mod64 by A1111 automatically. I resized all images accordingly.
default_prompt = "a detailed, high-quality, professional image"
guessMode=True for no-prompt
reduced batch size to 2x2 instead of 3x3
added more example images from https://huggingface.co/takuma104/controlnet_dev

If someone knows how to answers my training questions above, or has pointers to info, or knows someone who knows, that would help me alot!!

0 replies

geroldmeisinger · 2023-09-24T08:08:20Z

geroldmeisinger
Sep 24, 2023

UPDATE 118000 steps fp16 (45000 on original, 45000 on left-right flipped, no drops; epoch 2: 28000 steps with 50% drops)

results became worse, CN didn't pick up on no-prompts and answered by sending demons

restarting with 50% drop.

1 reply

geroldmeisinger Sep 26, 2023

I just generated 2x2 evaluations for normal prompts for all intermediate steps between 45000-90000 with a stepsize of 1000. I compared every step for bird, dog, lion, lenna and room. It looks as if doesn't converge towards anything. Sometimes the bird becomes more crisp, only to get glowing eyes, painty look, or over-saturated colors afterwards, meandering around and around. sometimes higher steps are better for the bird, but not for the lion. I don't know what to make of it.

geroldmeisinger · 2023-09-25T08:01:40Z

geroldmeisinger
Sep 25, 2023

UPDATE 45000 steps 50% drop https://huggingface.co/GeroldMeisinger/control-edgedrawing -> control-edgedrawing-cv480edpf-drop50-fp16-checkpoint-45000.safetensors => results are not good, 50% is probably too much for 45k steps. guessmode still doesn't work and tends to produces humans.

resuming until 90k with right-left flipped in the hope it will get better with more images

1 reply

geroldmeisinger Sep 26, 2023

UPDATE 90000 steps 50% drop https://huggingface.co/GeroldMeisinger/control-edgedrawing -> control-edgedrawing-cv480edpf-drop50-fp16-checkpoint-90000.safetensors => results are not good, 50% is probably too much for 90k steps. guessmode still doesn't work and tends to produces humans. aborting.

geroldmeisinger · 2023-09-28T16:46:18Z

geroldmeisinger
Sep 28, 2023

UPDATE Experiment 5.0 - 45000 steps with fastdup cleaned images and fp32 https://huggingface.co/GeroldMeisinger/control-edgedrawing -> control-edgedrawing-cv480edpf-fastdup-fp16-checkpoint-45000

okay, so I looked into image dataset sanitizing and apparently the laion2b-en-aesthetics65 contains about 40% duplicates (with fastdup default treshold=0.9). that's crazy! a small group of greyscale images are duplicated hundred of times. why has noone ever pointed this out before?!?! /i
besides, does anyone know about any precompiled, complimentary datasets for all the popular image datasets which already point out all these errors (e.g. #123 has duplicates [#234, #345, #456], #567 is blurry etc.). I just don't understand why everyone has to do same work over and over again?!
I also noticed that laion has a column "similarity" which is a metric of how well the caption fits the image. Which could explain the low quality of the control net. I know it's better to use conceptual captions, but I'm not that far yet.

And someday I will even understand what this means:

I believe this refers to the power lost during training :)

In my next experiment I'm going use the caption with the highest similarity value of all duplicates and train on non-squared images too.

the image dataset also contained .svg files which makes Pillow throw up. it's unbelievable with how much crap one has to put up in all these images.

00009/000092451.jpg
00010/000101435.jpg
00021/000219878.jpg
00024/000247899.jpg
00031/000313959.jpg
00032/000326525.jpg
00034/000340204.jpg
00035/000357377.jpg
00037/000374575.jpg
00047/000476044.jpg
00051/000519340.jpg
00052/000526796.jpg
00057/000579995.jpg

svg image, they all look similar to this:

what the duck?!

0 replies

geroldmeisinger · 2023-10-02T11:27:57Z

geroldmeisinger
Oct 2, 2023

UPDATE Experiment 6.0 - 135000 steps (~2.5 epochs) with rectangular, fastdup cleaned images https://huggingface.co/GeroldMeisinger/control-edgedrawing -> control-edgedrawing-cv480edpf-rect-fp16-checkpoint-XXX (45000, 90000, 135000)

So I had to leave home for a few days and set to training to infinite epochs, and came home to 135000 steps :) Image dataset includes rectangular images now (512x(n)*64 | n=8..16), 107122 original images, x2 with right-left flipped, /4 = 53561 for 1 epoch. I uploaded n*45000 to keep it comparable with the other checkpoints, so it's epoch=0.84, 1.68 and 2.52. I generated evaluations for ALL intermediate steps a 1000 steps. You can find the evaluation images in the .zip at HF.

Some very strange intermediate evaluation images:

checkpoint-77000 - all scribble art?

checkpoint-86000 - all gray?

checkpoint-97000 - all desatured?

checkpoint-131000 - all high contrast?

The same checkpoints for other image types (like "bird") look fine but show the same phenomenon at other checkpoints. I don't know what to make of it. Unlucky seeds => generate more images? More positive and negative prompts required to guide SD more? If we just compare the checkpoints I uploaded, we are lucky all look relatively fine, but none is significantly better than the other.

checkpoint-45000

checkpoint-90000

checkpoint-135000

Some things I derive from this (Update: all of this may be due to --gradient_accumulation_steps=4):

More steps don't make the control net better (at least not with current settings and image dataset: selection of images (aesthetics), quality of captions (alt-texts from laion), amount of images, mixed_precision=fp16).
At some point, it doesn't really makes sense to train for more steps. It just nudges the model towards different "styles" all the time (some checkpoints on average produce more: grayscale images, painty images etc.)
Multiple epochs don't produce demons and abominations, prompt-dropping does (with my current settings).
I didn't find the "sudden convergence phenomenon" to be true. If you choose a high enough frequency, you can see the control net already follows the edges very soon, it just doesn't fill it with meaningful content (like just replacing the background with the silhoutte of a different background). The dog2-example already follow the outline very soon (2k), later fills it with hair and dog abominations (4k), even later with a dog in a strange pose (6k), and even later it starts to gets it right (8k). There is no "sudden" in it.
lllyasviel: "Also, because of sudden converge phenomenon, use 10* gradient accumulation to optimize 15k steps will be better than 150k steps." What does it mean "BECAUSE of sudden convergent phenomenon"? I think all of this needs to be defined more precisly.
TODO I need a naming scheme for experiments
TODO Why are we refering to checkpoints with "steps" and "epochs", and not "images per step" like 300000a4?
TODO How do we refer to "real" epochs as opposed to "data augmented" epochs (like the left-right flipped images)? (we probably also call them epochs as we hope that the data augmentation doesn't introduce duplication)

Loss graph looks like this:

My assumption so far is "loss" means "the total of (absolute) changes over all weights", hence "how much the model was changed"?
TODO learn the basics of machine learning and read a book.
Is this supposed to show a tendency or converge towards anything?
Is this supposed to tell me when something goes wrong?
There are some ranges which have a lower loss for many consecutive steps. Do this checkpoints have a certain "quality", like "many consecutive low values mean the model converged towards a certain style"?

I read my own article again, especially the QA section about control net quality. It appears past-me already predicted many errors I made and points to "Increase gradient accumulation steps!". I don't know what "gradient accumulation steps" are, but it abbreviates to "GAS" which brings me to the conclusion that "Good control nets need more GAS". I started a new experiment with --gradient_accumulation_steps=16 (was 4 before). Preliminary results at lower steps already look much better as compared to x4 steps in previous setting. Which begs the question: Will no-prompt finally work with higher gas values?

lllyasviel "The batch size should not be reduced under any circumstances" ~~but I cannot increase batch size due to memory limits.~~ Update: it turns out I can increase it to 4-5 but then I cannot increase GAS. batch=2 and GAS=16 works.

"But usually, if your logic batch size is already bigger than 256, then further extending the batch size is not very meaningful. In that case, perhaps a better idea is to train more steps. I tried some "common" logic batch size at 64 or 96 or 128 (by gradient accumulation), it seems that many complicated conditions can be solved very well already."

0 replies

geroldmeisinger · 2023-10-03T14:36:30Z

geroldmeisinger
Oct 3, 2023

UPDATE Experiment 6.1 6696 steps with effective batch size 32 https://huggingface.co/GeroldMeisinger/control-edgedrawing -> control-edgedrawing-rect-fp16-batch32

lllyasviel: "In my experiments, [higher batch size] is usually better than [training on more steps]"

What an understatement. Higher effective batch size pretty much solved all problems, and the default settings from HF are crap.

No-prompt also works much better now

Following lllyasviel: "Because that "sudden converge" always happens, lets say "sudden converge" will happen at 3k step and our money can optimize 90k step, then we have two options: (1) train 3k steps, sudden converge, then train 87k steps. (2) 30x gradient accumulation, train 3k steps (90k real computation steps), then sudden converge."

The way I understand it, batch size and gradient accumulation step are the same concept, both of which "eat up" more images at once in order to get a better change-direction for the weights for a single step, which is better than changing the weights using only one image every step. This could explain and help to eleviate the "prompt dropping failures" and the "meandering in style" (TODO).
batch size * gradient accumulation step = effective batch size
Thus I'm going to increase batch size as far as VRAM allows it (without swapping into to slow shared memory) and increase gas until there are only a few images left to fine-tune towards the desired condition (which comes at the expense of speed).
From my previous try-out run with gas=16 I knew I already get a reasonable "dog2" at 4000 steps (leaving 9000 more steps to fine-tune), thus I could increase the effective batch size even more to 32 => --train_batch_size=2 --gradient_accumulation_steps=16 (for my 12GB setup). I could increase gas even further, if I had more images or make use of epochs.

Result: I got a stable "dog2" at 3800 steps, leaving 2900 more steps to fine-tune.

lllyasviel: "..in real cases, perhaps you may need to balance the steps before and after the "sudden converge" on your own to find a balance. The training after "sudden converge" is also important."

That's very vague but from what I learned so far, see evaluation images!
At checkpoint 11000 it produces the first reasonable dog2. At around 30000-40000 the model seems to be somewhat stable (less random abominations). It may be even better at 40000-60000, but it's hard to tell. After that, it's just "meandering in style". Hence I conclude: When we know the stepsize to get a reasonable dog2 (for our specific effective batch size), we need about x2-x3 steps to fine-tune for a stable model (3x 11000 > 30000). Probably we even need less than this, because with higher batch size the model gets already better with less steps (because it used more images per step in the previous steps). So we need to find the points were the model gets "reasonable", has enough images left for fine-tuning to become "stable", but avoid "unnecessary" steps and rather squeeze them into the "get reasonable" part.

This is how I imagine it:

Loss graph looks like this:

Maybe it's a recursive graph and shows "how much I'm lost" at understanding what the loss graph means..

About Sudden Convergence Phenomenon

This is the original image:

This is what I'm seeing:

bird 2400 (all previous steps look very similar)

bird 2500 (sudden change to something different)

bird 2600 (reasonable result)

dog2 2400 (all previous steps look very similar)

dog2 2500 (sudden change to something different)

dog2 2600 (following silhoutte, different background at silhoutte)

dog2 2700-2900 (abominations)

dog2 3000 (reasonable)

You can find all the original evaluation images for intermediate steps at HF. It's even less sudden with smaller batch sizes. My guess is lllyasviel used much greater effective batch sizes which makes it look like it is more sudden. And I hypothetize that the phenomenon is even more gradiual if we generate a image for every step at high batch sizes. But it's not that important anyway. We just have to know where the approximate step number is at which the model follows the conditioning to our satisfaction. However I propose to call it "sufficiently phenomenal convergence" which puts more emphasize on the gradual nature :D

On evaluation images

even the bad models generated good looking birds => the bird is not a good evaluation image
even the bad models generated humans with no-prompt for human images => humans are not a good evaluation image for a general controlnet, as SD preferably generates humans
without a controlnet, the lion already looks like the lion in the condition image => the lion is not a good evaluation image
=> I found the dog to be the best evaluation image

5 replies

geroldmeisinger Oct 3, 2023

In conclusion, the secret ingredient is: higher effective batch size.

fewjative Oct 11, 2023

Once in the green, is it just empirical judgement for when to stop?

reference:

geroldmeisinger Oct 12, 2023

from what I know so far, yes. If you see no more improvement after x steps, than what's the point? which basically boils down to "how to evaluate a CN and is there an automatic way?". my hope was that we can infer the convergence-point from lower batch sizes but so far that wasn't fruitful. I also try to avoid multiple epochs as I assume quality will degrade by training on the same data again (I still have to test this assumption).

the way I evaluate is by human judgement. I run this script to generate images on different topics. SargeTZ used a public wandb for his SDXL CN training with different evaluation images. another way is an "Ablation Study" (see next post), which was based on #188. This also gives you good insight into the quality.
some papers on diffusion models use CLIP score and FID score for automatic evaluation but I don't know what this is yet and if it would make sense here. having a programmatic way to evaluate quality would be nice, but so far I have no idea how it would be possible.
my sample size is 200k and it's enough to converge in one epoch at batchsize=4, 16 and 32. batchsize=64 didn't yet converge after 2.5 epochs and it's still running. batchsize=4 didn't get better with another epoch, just "different", see this post #318 (comment) , so I stopped it and drew my conclusion. batchsize=32 I have yet to run multi epoch training and evaluate to support this hypothesis. HF reported overfitting after too many epochs, but this may be because it's a very domain-specific model ("it's just faces"). other CNs were successfully trained on a lot of epochs (see Thibaud, SargeTZ, abehonest). the underlying concept of your CN idea of course should have a huge influence on all of these points, as I would guess an edge model is "simpler" than a depth model.
I also don't know what the correlation between batchsize and convergence-point is.

batchsize=4 converged at epoch=0.2
batchsize=16 converged at epoch=0.3
batchsize=32 converged at epoch=0.45
if 16 requires 0.1 more, and 32 requires 0.15 more, than 32->64 should require 0.2-0.3 more, right?
but batchsize=64 hasn't yet converged after epoch=2.5(??)

Thibaud said "you cannot know, just let it run more".

Hope this somewhat answers your question. If you run your own CN training, please share your experience!

geroldmeisinger Oct 12, 2023

Side thoughts: Lets assume a neural network is very good at linear transformations. If we train a CN on a linear transformation - which we can as well do programmatically - then a) it should converge faster b) we get a strict metric to evaluate the degree of convergence c) we get a strict metric for the point of convergence. Say we train a CN on generating grayscale images - which is pointless - we can simply calculate the metric by the distance of each channel to the ground truth. By extension, there should be a similar way for "complex" CNs based on some "distance" to the ground truth images (maybe that's what FID does?). However, if the CN exactly reproduces the ground truth it's called "overfitting". So we would need another metric to tell us how well it can generalize. And then find a sweet spot between the two metrics.
If we had a way to programmatically reverse the process of the CN (think grayscale -> color versus color -> grayscale), like moving the ground truth image based on the edge maps away to a vanilla SD generation the same a CN follows the condition but in reverse, we could train an estimator which tells how much the CN follows the condition.

geroldmeisinger Oct 19, 2023

No prompt images are rather artistic/abstract on lower checkpoints, but approximate the original realistic image on higher checkpoints. This is important because it means more steps still improve the ControlNet even though we hardly see any more improvements in the good prompt setting.

@fewjative you might be interested in this #561

then a) it should converge faster

this turned out not to be true :)

geroldmeisinger · 2023-10-05T18:57:50Z

geroldmeisinger
Oct 5, 2023

UPDATE Experiment 6.2 50% prompt dropping https://huggingface.co/GeroldMeisinger/control-edgedrawing -> control-edgedrawing-rect-fp16-batch32-drop50

first is 50% prompt dropping, second is no prompt dropping (see previous experiment). images use guessMode=true

feathers and wood are more textured, image looks more natural

dog is more dog like, less abstract

lion still looks like a dog, but has more textured hair

almost no difference, I guess a room is very clearly "a room"

almost no difference, I guess because SD already tends to generate humans

(intuitively I like the top one more, but we mustn't attribute SDs failure to draw faces to the controlnet)

promptful images are good too (you can find all other evaluation images in the zips at HF) => if controlnet works just as well with 50% prompts dropped I wonder how important the caption quality is at all. I'm decreasing caption priority
promptful images compared to the previous experiment show almost no difference but I'm much too biased => requires blind test
I think quality is comparable to original canny but I'm also much too biased => requires blind test
for in-depth evaluation I think we need to overlay the conditioning with the generate image to see were controlnet did or did not follow the condition

0 replies

alelordelo · 2023-11-30T13:19:05Z

alelordelo
Nov 30, 2023

hi @geroldmeisinger , thanks for the best CN report and tutorials I found!

I am also extensively training some CNs now, are you on any Discord channels so we can exchange some ideas?

3 replies

Edge Drawing - a Canny alternative #318

Replies: 16 comments · 13 replies

Replies: 16 comments 13 replies