This repository provides a simple way to visualize where a YOLOv7 model focuses when making predictions, using Class Activation Mapping (CAM) techniques such as Grad-CAM, Grad-CAM++, and XGrad-CAM. These visualizations help explain model decisions by highlighting important regions in an image.
- Class Activation Maps (CAMs): Visual explanations that show which image regions contribute most to a model’s decision for a target class. CAMs are useful for trust, transparency, and debugging, especially in high-stakes domains (e.g., medical).
- How CNNs enable CAMs: Early CNN layers learn simple features (edges, lines), while deeper layers learn higher-level patterns. CAM-based methods leverage the last convolutional feature maps to produce localization heatmaps that explain predictions.
- Grad-CAM (Gradient-weighted CAM): Computes gradients of a target class score with respect to activations in a chosen convolutional layer, aggregates them into neuron-importance weights, and forms a heatmap that highlights influential regions. Grad-CAM variants (e.g., Grad-CAM++, XGrad-CAM, Layer-CAM) can improve localization sharpness or flexibility across layers.
- Grad-CAM example heatmap overlay:
- Occlusion technique illustration: masking different regions to see how the prediction changes (red = important, blue = less important):
- Python 3.x
- PyTorch
- OpenCV
- NumPy
- Matplotlib
- tqdm
- PIL (Pillow)
- pytorch_grad_cam
-
Prepare YOLOv7 assets
- Download the YOLOv7 weights file (e.g.,
yolov7.pt) and ensure the configuration file exists (e.g.,cfg/training/yolov7.yaml).
- Download the YOLOv7 weights file (e.g.,
-
Adjust parameters if needed
- Edit
get_paramsinsidegradcam.pyto change defaults such as weight, cfg, device, CAM method, target layer, backward type, confidence threshold, and ratio.
- Edit
-
Run the script
- The minimal command will use the defaults and process the example image:
python gradcam.py
- By default, it reads
inference/images/image3.jpgand writes outputs into theresultdirectory.
- weight: Path to the YOLOv7 weights file.
- cfg: Path to the YOLOv7 configuration file.
- device: Compute device (
'cpu'or'cuda'). - method: CAM method (
'GradCAM','GradCAMPlusPlus','XGradCAM'). - layer: Target layer to visualize (e.g.,
'model.model[-2]'). - backward_type: Backward objective (
'class'for class score or'conf'for confidence). - conf_threshold: Confidence threshold for detections.
- ratio: Fraction of top predictions to visualize.
def get_params():
params = {
'weight': 'yolov7.pt',
'cfg': 'cfg/training/yolov7.yaml',
'device': 'cpu',
'method': 'GradCAM', # GradCAMPlusPlus, GradCAM, XGradCAM
'layer': 'model.model[-2]',
'backward_type': 'class', # class or conf
'conf_threshold': 0.6, # 0.6
'ratio': 0.02 # 0.02-0.1
}
return params
if __name__ == '__main__':
model = yolov7_heatmap(**get_params())
model('inference/images/image3.jpg', 'result')- You can choose different layers for visualization; deeper layers often give more semantic heatmaps, while earlier layers may show lower-level features.
- Grad-CAM and its variants are widely used to understand, debug, and increase trust in CNN-based models by revealing which regions drive predictions.

