Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does the loss function work and how is it actually implemented in Torch ? #4

Open
gauthsvenkat opened this issue Jan 21, 2019 · 15 comments

Comments

@gauthsvenkat
Copy link

As per suggestion, I've opened a new issue so that others might also benefit from this.

@gauthsvenkat
Copy link
Author

Okay, after reading up on "https://arxiv.org/abs/1506.02106" and watershed transforms I kinda understand the losses.

How exactly are the blobs computed ? (How are they stored and made use of for calculating split level loss)

@IssamLaradji
Copy link
Collaborator

Blobs are computed in two steps (as shown in line 33-46 in models.py):

  1. First we computepred_mask = self(images).data.max(1)[1].squeeze().cpu().numpy(),
    which gets the argmax over the probability of each pixel over the K channels in the K x H x W mask where K is the number of categories, and H and W are the height and width.

As a result this gives you a binary matrix for each category. Note that the background is also a category.

  1. Then we apply the connect components algorithm (skimage) to get the blobs for each binary matrix,
    connected_components = morph.label(pred_mask==category_id)

connected_components has each blob labeled with a different unique id. The number of unique ids is the number of blobs.

@gauthsvenkat
Copy link
Author

What does it mean when blobs[ None ] is returned ? I've never seen this anywhere before. (Also counts[None])

@IssamLaradji
Copy link
Collaborator

[None] just adds a dimension, if blobs is K x H x W, then blobs[None] is 1 x K x H x W

@gauthsvenkat
Copy link
Author

gauthsvenkat commented Jan 29, 2019

I went through all the code in model.py and I don't understand much. Particularly in class FCN8 I understand the computation till fc7 but I get lost in the semantic segmentation part.

`
scores = self.scoring_layer( fc7 )
upscore2 = self.upscore2(scores)

    # second
    score_pool4 = self.score_pool4(pool4)
    score_pool4c = score_pool4[:, :, 5:5+upscore2.size(2), 
                                     5:5+upscore2.size(3)]
    upscore_pool4 = self.upscore_pool4(score_pool4c + upscore2)

    # third
    score_pool3 = self.score_pool3(pool3)
    score_pool3c = score_pool3[:, :, 9:9+upscore_pool4.size(2), 
                                     9:9+upscore_pool4.size(3)]

    output = self.upscore8(score_pool3c + upscore_pool4) 

    return output[:, :, 31: (31 + h), 31: (31 + w)].contiguous()

`

What exactly is happening here ? Also, how many outputs does the model have ? It's supposed to output the blobs and it also is supposed output the locations of the detected objects right ?

I'm getting confused as to what the output of the model is and how exactly the blobs are handled (Apologies if I'm getting back to the same thing again, I'm having a touch time wrapping my head around this).

@IssamLaradji
Copy link
Collaborator

The segmentation part you showed is the upsampling path which combines different features from VGG16 to output a K x H x W matrix where K is the number of classes and H and W are the image width and height. This procedure is described more fully in the first deep-based segmentation paper as FCN8: https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf

At this point, there are no blobs, just activations for each pixel for each class. Hope this helps.

@gauthsvenkat
Copy link
Author

No, I mean more specifically, what does

score_pool4c = score_pool4[:, :, 5:5+upscore2.size(2), 5:5+upscore2.size(3)]
Why 5, 9 and 31 ?

And how are blobs computed from this again ?

@IssamLaradji
Copy link
Collaborator

  1. The 5, 9, and 31 are there to take care of the shifting due to the max-pooling layers of VGG16.
  2. Once you get the output above (which is K x H x W) , you apply the following two functions to get the blobs,
# Get the labels across the channels 'K' which is the number of classes
pred_mask = output.argmax(1).squeeze().cpu().numpy()

# Get the blobs for category 'k' as follows:
blobs_k = morph.label(pred_mask==k)

@gauthsvenkat
Copy link
Author

Hey, I'd been looking at your code for a while now and zeroed in on the parts I don't understand (I have other parts I don't understand either but it requires me to understand the former first).

  1. In get_blob_dict in losses.py
    `

     blob_uniques, blob_counts = np.unique(class_blobs * (points_mask), return_counts=True) <--- THIS
     uniques = np.delete(np.unique(class_blobs), blob_uniques)  <--- THIS
    
     for u in uniques: #iterate over falsely predicted blobs
         blobList += [{"class":l, "label":u, "n_points":0, "size":0,
                      "pointsList":[]}]
         n_fp += 1 
    
     for i, u in enumerate(blob_uniques):
         if u == 0:
             continue
    
         pointsList = []
         blob_ind = class_blobs==u
    
         locs = np.where(blob_ind * (points_mask)) <--- THIS
    
         for j in range(locs[0].shape[0]):
             pointsList += [{"y":locs[0][j], "x":locs[1][j]}]`
    

    i) I'm guessing that blob_uniques now contains points that are inside the predicted blobs ?
    ii) I don't understand why np.delete(). I'm guessing it's to ignore the points that are inside correct
    blobs ? I read up on the doc for np.delete() and I don't think it's doing what it's supposed to be
    doing.
    iii) I also don't understand what np.where() is supposed to be doing ? Again the docs don't
    corroborate what's supposed to happening there.

  2. In compute_image_loss in losses.py

ones = torch.ones(Counts.size(0), 1).long().cuda()
BgFgCounts = torch.cat([ones, Counts], 1)
Target = (BgFgCounts.view(n*k).view(-1) > 0).view(-1).float()
Smax = S.view(n, k, h*w).max(2)[0].view(-1)

I don't understand how you get target values because when I tried to simulate that piece of code (with respect to trancos dataset, only two classes, background and foreground), ones and BgFgCounts don't have the same number of dimensions. Also is there any reason you flatten Target twice ?

  1. In compute_fp_loss() in losses.py

T = np.ones(blobs.shape[-2:]) #FLAG
T[blobs[b["class"]] == b["label"]

I'm not entirely sure what's happening here either.

I understand a lot of these questions might be very trivial 😅 but you have no idea how much your help is appreciated. Thanks again 😃.

@IssamLaradji
Copy link
Collaborator

  1. i. blob_uniques corresponds to all the unique blobs that intersect with the point-annotations. The false positive blobs are those that do not intersect with the point-annotations, which is what uniques = np.delete(np.unique(class_blobs), blob_uniques) gives us.

    ii. numpy.delete(arr, obj, axis=None) deletes the values in obj from arr. Like you said, it's to ignore the points that are inside correct blobs.

    iii. np.where() returns the x- and y- coordinates for the points. In this case, locs = np.where(blob_ind * (points_mask)) returns the locations of the point-annotations that intersect with the predicted blobs.

  2. i. Trancos only has two classes. BgFgCounts consists of the background label (which is 1 for all datasets) and whether other objects exist. Since Trancos images always have a car in them, BgFgCounts=[1,1] for all images.

    ii. There is no reason for flattening Target twice, you are right.

  3. For the compute_fp_loss, b is the binary matrix corresponding to the blob with no points ; i.e. b["n_points"] == 0 which is a false positive blob. Therefore, T[blobs[b["class"]] == b["label"]] = 0 gets the indices where the false positive blob is and sets it to zero. Then F.nll_loss(S_log, torch.LongTensor(T).cuda()[None],...) sets those blobs to background using the cross entropy loss.

Hope this helps!

PS: In case it's helpful, you can put a breaking point as import ipdb; ipdb.set_trace() at any line and observe how the arrays are behaving across program.

@gauthsvenkat
Copy link
Author

Thanks a lot! I didn't know about ipdb, I'll make use of it.

@gauthsvenkat
Copy link
Author

Hey, Just to clarify, In models.py, ResFCN, You're first reducing the size using interpolate and then in the end you're increasing the size in the last interpolate? Cause I'm guessing logits_16s_spatial_dim will be smaller than that of 32s and 8s spatial dimensions would we smaller than that of 16s ?

image

@IssamLaradji
Copy link
Collaborator

IssamLaradji commented Mar 25, 2019

  • logits_32s is the original image size divided by 32.
  • logits_16s is the original image size divided by 16.
  • logits_8s is the original image size divided by 8. So logits_8s is the largest one.

The interpolate code you showed above is the upsampling path of the network.

  • logits_32 gets resized to the size of logits_16, and then added to logits_16;
  • logits_16 gets resized to the size of logits_8, and then added to logits_8; and finally
  • logits_8 gets resized to the size of the original image.

@gauthsvenkat
Copy link
Author

Ah, yes, I got it. Got a bit confused with the variable names themselves cause I initialized them wrong. Thanks a lot!

@gauthsvenkat
Copy link
Author

gauthsvenkat commented Mar 31, 2019

I've ran into a few more questions that unfortunately ipdb couldn't answer (Thanks a lot for that btw, it was very helpful).

  1. isn't false positive loss basically a part of the point level loss when it is calculated the first time around ?
    Meaning you are penalizing the network twice as much for false positives isn't it ?

  2. Also I don't quite get the places where you've used
    blobs[b['class']] == b['label']
    Okay I mostly understand this as you're getting the blobs of a particular class and then working with them. Took me a while to figure this out. So in trancos b['class'] will always be 0 right? Since there is only one other class besides background ?

  3. I don't have a formal education in digital image processing so this one might be a bit trivial. What does black_tophat in watersplit() do exactly (with respect to the probabilities) ? I read up on it and it seems to be a way to contrast important objects but I see you're using it with the probabilities and I'm not sure what to make of it.

So what I understand from split_level_loss is that you're getting the boundaries between objects (inside the blob) and then setting them to background (0) and then performing nll_loss against the output of the network.

Please correct me if I am wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants