Skip to content

Commit

Permalink
updated CAM blog
Browse files Browse the repository at this point in the history
  • Loading branch information
chokevin8 committed Nov 7, 2023
1 parent 0f7d8a7 commit 332294b
Showing 1 changed file with 12 additions and 2 deletions.
14 changes: 12 additions & 2 deletions _posts/2023-10-20-class-activation-maps.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,18 @@ most likely belong to features in the image that are correlated with classes tha
$$Y_c = ReLU (\sum_{k} {\alpha_k}^{c} \cdot F^{k}) \text{ where } {\alpha_k}^{c} = \frac{1}{Z} \sum_{i}\sum_{j} \frac{\delta Y^{c}}{\delta A_{(i,j)}^{k}} (6)$$
</p>
Equation #6 above is the final equation for Grad-CAM. Now, even though CAMs were widely used for classification, they could definitely also be used for semantic segmentation tasks
where each pixel is labeled as a class. In this case,
where each pixel is labeled as a class. In this case, however, we have to modify equation #6 a bit. This is because image classification outputs a single class distribution (ex. this image is
a dog), image semantic segmentation doesn't, as it outputs logits for every pixel $$(a,b)$$ predicted for class $$c$$. Therefore, it makes sense to sum all of these pixels as the activation score so that
it becomes a single class distribution like image classification. We therefore modify the $$Y^{c}$$ in the gradient to $$\sum_{(a,b) \in M}{y_{(a,b)}}^{c}$$ where $$M$$ is a set of all pixel indices that belong
to class $$c$$ in the segmentation prediction. The final equation for Grad-CAM in image segmentation is shown below:
<p>
$$Y_c = ReLU (\sum_{k} {\alpha_k}^{c} \cdot F^{k}) \text{ where } {\alpha_k}^{c} = \frac{1}{Z} \sum_{i}\sum_{j} \frac{\delta Y^{c}}{\delta A_{(i,j)}^{k}} (6)$$

</p>

However, if we look at the equation for Grad-CAM carefully, there is a critical issue- the gradients are averaged due to global average pooling (GAP). Why can this be a problem?
Look at the diagram below:

HiResCAM:

<a id="cam-in-my-proj"></a>
## **Utilizing CAM in My Project:**
Expand Down Expand Up @@ -252,3 +261,4 @@ we can see that there isn't a big difference between the two, except that HiResC

*Image credits to:*
- [Image Classification CAM Diagram](http://cnnlocalization.csail.mit.edu/)
- [HiResCAM Diagram](https://arxiv.org/pdf/2011.08891.pdf)

0 comments on commit 332294b

Please sign in to comment.