x1.5 model consistently underperforming #524

raember · 2023-07-24T08:37:33Z

I'm using NanoDet-Plus for research purposes but ran into a weird issue where the x1.5 model variants consistently underperform compared to the x1.0 models. This happens on VISEM, Argoverse-HD, and even COCO2017.
For COCO2017 I used the stock config provided in the git repo (x1.0 and x1.5), but the COCO mAP already separates the two after the first 10 epochs, with the bigger variant consistently scoring lower. This holds true for all three datasets from my observations:

I get the following mAP metrics:

	x1.0	x1.5	epochs
VISEM	6.9%	2.0%	100
Argoverse-HD	24.1%	21.2%	70
COCO2017	20.9%	14.5%	40*

*The models are still training as of now, but the separation in mAP mentioned above is already distinctly noticeable in the logs. I will let the runs continue until 300 epochs have been reached, as the stock config dictates.

Here are the configs I used for VISEM and Argoverse-HD:

VISEM x1.0

save_dir: workspace/baseline/visem/nanodet-plus-m-1.0x-dgxa100
model:
  arch:
    backbone:
      name: ShuffleNetV2
      model_size: 1.0x
      out_stages: [2, 3, 4]
      activation: LeakyReLU
      channels: 3
    fpn:
      name: GhostPAN
      in_channels: [116, 232, 464]
      out_channels: 96
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: true
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 3
      input_channel: 96
      feat_channels: 96
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
      norm_cfg: {type: BN}
      loss:
        loss_qfl: {name: QualityFocalLoss, use_sigmoid: true, beta: 2.0, loss_weight: 1.0}
        loss_dfl: {name: DistributionFocalLoss, loss_weight: 0.25}
        loss_bbox: {name: GIoULoss, loss_weight: 2.0}
    name: NanoDetPlus
    detach_epoch: 10
    aux_head:
      name: SimpleConvHead
      num_classes: 3
      input_channel: 192
      feat_channels: 192
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
  weight_averager: {name: ExpMovingAverager, decay: 0.9998}
device:
  precision: 16
  gpu_ids: [0]
  workers_per_gpu: 28
  batchsize_per_gpu: 92
schedule:
  optimizer: {name: AdamW, lr: 0.001, weight_decay: 1.0e-05}
  warmup: {name: linear, steps: 500, ratio: 0.0001}
  total_epochs: 10
  lr_schedule: {name: CosineAnnealingLR, T_max: 300, eta_min: 5.0e-05}
  val_intervals: 10
log: {interval: 50}
test: {}
grad_clip: 35
evaluator: {name: CocoDetectionEvaluator, save_key: mAP}
class_names: &id001 [sperm, cluster, small/pinhead]
data:
  train:
    name: VISEMDataset
    pipeline:
      perspective: 0
      scale: [0.8, 1.2]
      stretch:
      - [0.95, 1.05]
      - [0.95, 1.05]
      rotation: 180
      shear: 10
      translate: 0.2
      flip: 0.5
      brightness: 0.2
      contrast: [0.6, 1.4]
      saturation: [0.6, 1.2]
      normalize: &id002
      - [123.675, 116.28, 103.53]
      - [58.395, 57.12, 57.375]
    img_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    ann_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    class_names: *id001
    input_size: &id003 [640, 480]
    keep_ratio: false
  val:
    name: VISEMDataset
    pipeline:
      normalize: *id002
    img_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    ann_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    class_names: *id001
    input_size: *id003
    keep_ratio: false

VISEM x1.5

save_dir: workspace/baseline/visem/nanodet-plus-m-1.5x-dgxa100
model:
  arch:
    backbone:
      name: ShuffleNetV2
      model_size: 1.5x
      out_stages: [2, 3, 4]
      activation: LeakyReLU
      channels: 3
    fpn:
      name: GhostPAN
      in_channels: [176, 352, 704]
      out_channels: 128
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: true
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 3
      input_channel: 128
      feat_channels: 128
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
      norm_cfg: {type: BN}
      loss:
        loss_qfl: {name: QualityFocalLoss, use_sigmoid: true, beta: 2.0, loss_weight: 1.0}
        loss_dfl: {name: DistributionFocalLoss, loss_weight: 0.25}
        loss_bbox: {name: GIoULoss, loss_weight: 2.0}
    name: NanoDetPlus
    detach_epoch: 10
    aux_head:
      name: SimpleConvHead
      num_classes: 3
      input_channel: 256
      feat_channels: 256
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
  weight_averager: {name: ExpMovingAverager, decay: 0.9998}
device:
  precision: 16
  gpu_ids: [0]
  workers_per_gpu: 28
  batchsize_per_gpu: 88
schedule:
  optimizer: {name: AdamW, lr: 0.001, weight_decay: 1.0e-05}
  warmup: {name: linear, steps: 500, ratio: 0.0001}
  total_epochs: 10
  lr_schedule: {name: CosineAnnealingLR, T_max: 300, eta_min: 5.0e-05}
  val_intervals: 10
log: {interval: 50}
test: {}
grad_clip: 35
evaluator: {name: CocoDetectionEvaluator, save_key: mAP}
class_names: &id001 [sperm, cluster, small/pinhead]
data:
  train:
    name: VISEMDataset
    pipeline:
      perspective: 0
      scale: [0.8, 1.2]
      stretch:
      - [0.95, 1.05]
      - [0.95, 1.05]
      rotation: 180
      shear: 10
      translate: 0.2
      flip: 0.5
      brightness: 0.2
      contrast: [0.6, 1.4]
      saturation: [0.6, 1.2]
      normalize: &id002
      - [123.675, 116.28, 103.53]
      - [58.395, 57.12, 57.375]
    img_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    ann_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    class_names: *id001
    input_size: &id003 [640, 480]
    keep_ratio: false
  val:
    name: VISEMDataset
    pipeline:
      normalize: *id002
    img_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    ann_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    class_names: *id001
    input_size: *id003
    keep_ratio: false

Argoverse-HD x1.0

save_dir: workspace/baseline/argoverse/nanodet-plus-m-1.0x-dgxa100
model:
  arch:
    backbone:
      name: ShuffleNetV2
      model_size: 1.0x
      out_stages: [2, 3, 4]
      activation: LeakyReLU
      channels: 3
    fpn:
      name: GhostPAN
      in_channels: [116, 232, 464]
      out_channels: 96
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: true
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 8
      input_channel: 96
      feat_channels: 96
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
      norm_cfg: {type: BN}
      loss:
        loss_qfl: {name: QualityFocalLoss, use_sigmoid: true, beta: 2.0, loss_weight: 1.0}
        loss_dfl: {name: DistributionFocalLoss, loss_weight: 0.25}
        loss_bbox: {name: GIoULoss, loss_weight: 2.0}
    name: NanoDetPlus
    detach_epoch: 10
    aux_head:
      name: SimpleConvHead
      num_classes: 8
      input_channel: 192
      feat_channels: 192
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
  weight_averager: {name: ExpMovingAverager, decay: 0.9998}
device:
  precision: 16
  gpu_ids: [0]
  workers_per_gpu: 28
  batchsize_per_gpu: 6
schedule:
  optimizer: {name: AdamW, lr: 0.0003, weight_decay: 0.01}
  warmup: {name: linear, steps: 500, ratio: 0.0001}
  total_epochs: 70
  lr_schedule: {name: CosineAnnealingLR, T_max: 300, eta_min: 5.0e-05}
  val_intervals: 10
log: {interval: 50}
test: {}
grad_clip: 35
evaluator: {name: CocoDetectionEvaluator, save_key: mAP}
class_names: &id001 [person, bicycle, car, motorcycle, bus, truck, traffic_light,
  stop_sign]
data:
  train:
    name: ArgoverseDataset
    pipeline:
      perspective: 0
      scale: [0.8, 1.2]
      stretch:
      - [0.95, 1.05]
      - [0.95, 1.05]
      rotation: 0
      shear: 0
      translate: 0.1
      flip: 0
      brightness: 0.2
      contrast: [0.6, 1.4]
      saturation: [0.6, 1.2]
      normalize: &id002
      - [123.675, 116.28, 103.53]
      - [58.395, 57.12, 57.375]
    class_names: *id001
    input_size: &id003 [1680, 1050]
    keep_ratio: true
    img_path: ../data/Argoverse-1.1/tracking/train
    ann_path: ../data/Argoverse-HD/annotations/train.json
  val:
    name: ArgoverseDataset
    pipeline:
      normalize: *id002
    class_names: *id001
    input_size: *id003
    keep_ratio: true
    img_path: ../data/Argoverse-1.1/tracking/val
    ann_path: ../data/Argoverse-HD/annotations/val.json

Argoverse-HD x1.5

save_dir: workspace/baseline/argoverse/nanodet-plus-m-1.5x-dgxa100
model:
  arch:
    backbone:
      name: ShuffleNetV2
      model_size: 1.5x
      out_stages: [2, 3, 4]
      activation: LeakyReLU
      channels: 3
    fpn:
      name: GhostPAN
      in_channels: [176, 352, 704]
      out_channels: 128
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: true
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 8
      input_channel: 128
      feat_channels: 128
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
      norm_cfg: {type: BN}
      loss:
        loss_qfl: {name: QualityFocalLoss, use_sigmoid: true, beta: 2.0, loss_weight: 1.0}
        loss_dfl: {name: DistributionFocalLoss, loss_weight: 0.25}
        loss_bbox: {name: GIoULoss, loss_weight: 2.0}
    name: NanoDetPlus
    detach_epoch: 10
    aux_head:
      name: SimpleConvHead
      num_classes: 8
      input_channel: 256
      feat_channels: 256
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
  weight_averager: {name: ExpMovingAverager, decay: 0.9998}
device:
  precision: 16
  gpu_ids: [0]
  workers_per_gpu: 28
  batchsize_per_gpu: 6
schedule:
  optimizer: {name: AdamW, lr: 0.0003, weight_decay: 0.01}
  warmup: {name: linear, steps: 500, ratio: 0.0001}
  total_epochs: 70
  lr_schedule: {name: CosineAnnealingLR, T_max: 300, eta_min: 5.0e-05}
  val_intervals: 10
log: {interval: 50}
test: {}
grad_clip: 35
evaluator: {name: CocoDetectionEvaluator, save_key: mAP}
class_names: &id001 [person, bicycle, car, motorcycle, bus, truck, traffic_light,
  stop_sign]
data:
  train:
    name: ArgoverseDataset
    pipeline:
      perspective: 0
      scale: [0.8, 1.2]
      stretch:
      - [0.95, 1.05]
      - [0.95, 1.05]
      rotation: 0
      shear: 0
      translate: 0.1
      flip: 0
      brightness: 0.2
      contrast: [0.6, 1.4]
      saturation: [0.6, 1.2]
      normalize: &id002
      - [123.675, 116.28, 103.53]
      - [58.395, 57.12, 57.375]
    class_names: *id001
    input_size: &id003 [1680, 1050]
    keep_ratio: true
    img_path: ../data/Argoverse-1.1/tracking/train
    ann_path: ../data/Argoverse-HD/annotations/train.json
  val:
    name: ArgoverseDataset
    pipeline:
      normalize: *id002
    class_names: *id001
    input_size: *id003
    keep_ratio: true
    img_path: ../data/Argoverse-1.1/tracking/val
    ann_path: ../data/Argoverse-HD/annotations/val.json

Is this erroneous behavior? Is there something wrong with my setup?

RangiLyu · 2023-08-08T09:34:43Z

This may be due to the fact that the 1.5x model is not initialized with imagenet pre-training

nanodet/nanodet/model/backbone/shufflenetv2.py

Lines 9 to 10 in 3c9607c

    
           "shufflenetv2_1.0x": "https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth",  # noqa: E501 
        
           "shufflenetv2_1.5x": None,

raember · 2023-08-16T08:32:24Z

Hi, and thanks for your response. I don't know how I could have overlooked this, but evidently I did.
In the meantime, I tried replicating your metrics on COCO, with this result:

	x1.0	x1.5	epochs
COCO2017	26.5%	28.9%	300

So I also trained the same models on VISEM for 300 epochs, and although in the end, the x1.5 model has a higher mAP, it's because the x1.0 model drops in performance:

	x1.0	x1.5	epochs
VISEM	5.7%	5.9%	300

My interpretation is that the shufflenet backbone struggles with the nature of the images, as they are microscopic recordings, which an ImageNet pre-training does not generalize well to.
I was not able to train it on Argoverse-HD for 300 epochs yet, as that requires a substantial amount of time, but as seen in my initial post, it benefits way more from the pre-trained weights.
I'm wondering if it will also see a drop in mAP of the x1.0 model though, just like with VISEM.

Now, after looking into the matter of the missing weights for the backbone, I found the following issues:
Pre-trained shufflenetv2 checkpoints (x1.5 and x2.0) not being supported links to an issue about adding more pre-trained weights, which has a link to a merged PR adding them to the repo. I'll make a PR about this in a bit, but at least this means good news, since this should help mitigate this performance disparity.

raember closed this as completed Aug 16, 2023

raember mentioned this issue Aug 16, 2023

Added pre-trained weights for ShuffleNetV2 x1.5 and x2.0 #526

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x1.5 model consistently underperforming #524

x1.5 model consistently underperforming #524

raember commented Jul 24, 2023

RangiLyu commented Aug 8, 2023

raember commented Aug 16, 2023 •

edited

Loading

x1.5 model consistently underperforming #524

x1.5 model consistently underperforming #524

Comments

raember commented Jul 24, 2023

RangiLyu commented Aug 8, 2023

raember commented Aug 16, 2023 • edited Loading

raember commented Aug 16, 2023 •

edited

Loading