Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x1.5 model consistently underperforming #524

Closed
raember opened this issue Jul 24, 2023 · 2 comments · Fixed by #526
Closed

x1.5 model consistently underperforming #524

raember opened this issue Jul 24, 2023 · 2 comments · Fixed by #526

Comments

@raember
Copy link
Contributor

raember commented Jul 24, 2023

I'm using NanoDet-Plus for research purposes but ran into a weird issue where the x1.5 model variants consistently underperform compared to the x1.0 models. This happens on VISEM, Argoverse-HD, and even COCO2017.
For COCO2017 I used the stock config provided in the git repo (x1.0 and x1.5), but the COCO mAP already separates the two after the first 10 epochs, with the bigger variant consistently scoring lower. This holds true for all three datasets from my observations:
image

I get the following mAP metrics:

x1.0 x1.5 epochs
VISEM 6.9% 2.0% 100
Argoverse-HD 24.1% 21.2% 70
COCO2017 20.9% 14.5% 40*

*The models are still training as of now, but the separation in mAP mentioned above is already distinctly noticeable in the logs. I will let the runs continue until 300 epochs have been reached, as the stock config dictates.

Here are the configs I used for VISEM and Argoverse-HD:

VISEM x1.0
save_dir: workspace/baseline/visem/nanodet-plus-m-1.0x-dgxa100
model:
  arch:
    backbone:
      name: ShuffleNetV2
      model_size: 1.0x
      out_stages: [2, 3, 4]
      activation: LeakyReLU
      channels: 3
    fpn:
      name: GhostPAN
      in_channels: [116, 232, 464]
      out_channels: 96
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: true
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 3
      input_channel: 96
      feat_channels: 96
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
      norm_cfg: {type: BN}
      loss:
        loss_qfl: {name: QualityFocalLoss, use_sigmoid: true, beta: 2.0, loss_weight: 1.0}
        loss_dfl: {name: DistributionFocalLoss, loss_weight: 0.25}
        loss_bbox: {name: GIoULoss, loss_weight: 2.0}
    name: NanoDetPlus
    detach_epoch: 10
    aux_head:
      name: SimpleConvHead
      num_classes: 3
      input_channel: 192
      feat_channels: 192
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
  weight_averager: {name: ExpMovingAverager, decay: 0.9998}
device:
  precision: 16
  gpu_ids: [0]
  workers_per_gpu: 28
  batchsize_per_gpu: 92
schedule:
  optimizer: {name: AdamW, lr: 0.001, weight_decay: 1.0e-05}
  warmup: {name: linear, steps: 500, ratio: 0.0001}
  total_epochs: 10
  lr_schedule: {name: CosineAnnealingLR, T_max: 300, eta_min: 5.0e-05}
  val_intervals: 10
log: {interval: 50}
test: {}
grad_clip: 35
evaluator: {name: CocoDetectionEvaluator, save_key: mAP}
class_names: &id001 [sperm, cluster, small/pinhead]
data:
  train:
    name: VISEMDataset
    pipeline:
      perspective: 0
      scale: [0.8, 1.2]
      stretch:
      - [0.95, 1.05]
      - [0.95, 1.05]
      rotation: 180
      shear: 10
      translate: 0.2
      flip: 0.5
      brightness: 0.2
      contrast: [0.6, 1.4]
      saturation: [0.6, 1.2]
      normalize: &id002
      - [123.675, 116.28, 103.53]
      - [58.395, 57.12, 57.375]
    img_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    ann_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    class_names: *id001
    input_size: &id003 [640, 480]
    keep_ratio: false
  val:
    name: VISEMDataset
    pipeline:
      normalize: *id002
    img_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    ann_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    class_names: *id001
    input_size: *id003
    keep_ratio: false
VISEM x1.5
save_dir: workspace/baseline/visem/nanodet-plus-m-1.5x-dgxa100
model:
  arch:
    backbone:
      name: ShuffleNetV2
      model_size: 1.5x
      out_stages: [2, 3, 4]
      activation: LeakyReLU
      channels: 3
    fpn:
      name: GhostPAN
      in_channels: [176, 352, 704]
      out_channels: 128
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: true
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 3
      input_channel: 128
      feat_channels: 128
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
      norm_cfg: {type: BN}
      loss:
        loss_qfl: {name: QualityFocalLoss, use_sigmoid: true, beta: 2.0, loss_weight: 1.0}
        loss_dfl: {name: DistributionFocalLoss, loss_weight: 0.25}
        loss_bbox: {name: GIoULoss, loss_weight: 2.0}
    name: NanoDetPlus
    detach_epoch: 10
    aux_head:
      name: SimpleConvHead
      num_classes: 3
      input_channel: 256
      feat_channels: 256
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
  weight_averager: {name: ExpMovingAverager, decay: 0.9998}
device:
  precision: 16
  gpu_ids: [0]
  workers_per_gpu: 28
  batchsize_per_gpu: 88
schedule:
  optimizer: {name: AdamW, lr: 0.001, weight_decay: 1.0e-05}
  warmup: {name: linear, steps: 500, ratio: 0.0001}
  total_epochs: 10
  lr_schedule: {name: CosineAnnealingLR, T_max: 300, eta_min: 5.0e-05}
  val_intervals: 10
log: {interval: 50}
test: {}
grad_clip: 35
evaluator: {name: CocoDetectionEvaluator, save_key: mAP}
class_names: &id001 [sperm, cluster, small/pinhead]
data:
  train:
    name: VISEMDataset
    pipeline:
      perspective: 0
      scale: [0.8, 1.2]
      stretch:
      - [0.95, 1.05]
      - [0.95, 1.05]
      rotation: 180
      shear: 10
      translate: 0.2
      flip: 0.5
      brightness: 0.2
      contrast: [0.6, 1.4]
      saturation: [0.6, 1.2]
      normalize: &id002
      - [123.675, 116.28, 103.53]
      - [58.395, 57.12, 57.375]
    img_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    ann_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    class_names: *id001
    input_size: &id003 [640, 480]
    keep_ratio: false
  val:
    name: VISEMDataset
    pipeline:
      normalize: *id002
    img_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    ann_path: ../data/VISEM/VISEM_Tracking_Train_v4/Train
    class_names: *id001
    input_size: *id003
    keep_ratio: false
Argoverse-HD x1.0
save_dir: workspace/baseline/argoverse/nanodet-plus-m-1.0x-dgxa100
model:
  arch:
    backbone:
      name: ShuffleNetV2
      model_size: 1.0x
      out_stages: [2, 3, 4]
      activation: LeakyReLU
      channels: 3
    fpn:
      name: GhostPAN
      in_channels: [116, 232, 464]
      out_channels: 96
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: true
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 8
      input_channel: 96
      feat_channels: 96
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
      norm_cfg: {type: BN}
      loss:
        loss_qfl: {name: QualityFocalLoss, use_sigmoid: true, beta: 2.0, loss_weight: 1.0}
        loss_dfl: {name: DistributionFocalLoss, loss_weight: 0.25}
        loss_bbox: {name: GIoULoss, loss_weight: 2.0}
    name: NanoDetPlus
    detach_epoch: 10
    aux_head:
      name: SimpleConvHead
      num_classes: 8
      input_channel: 192
      feat_channels: 192
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
  weight_averager: {name: ExpMovingAverager, decay: 0.9998}
device:
  precision: 16
  gpu_ids: [0]
  workers_per_gpu: 28
  batchsize_per_gpu: 6
schedule:
  optimizer: {name: AdamW, lr: 0.0003, weight_decay: 0.01}
  warmup: {name: linear, steps: 500, ratio: 0.0001}
  total_epochs: 70
  lr_schedule: {name: CosineAnnealingLR, T_max: 300, eta_min: 5.0e-05}
  val_intervals: 10
log: {interval: 50}
test: {}
grad_clip: 35
evaluator: {name: CocoDetectionEvaluator, save_key: mAP}
class_names: &id001 [person, bicycle, car, motorcycle, bus, truck, traffic_light,
  stop_sign]
data:
  train:
    name: ArgoverseDataset
    pipeline:
      perspective: 0
      scale: [0.8, 1.2]
      stretch:
      - [0.95, 1.05]
      - [0.95, 1.05]
      rotation: 0
      shear: 0
      translate: 0.1
      flip: 0
      brightness: 0.2
      contrast: [0.6, 1.4]
      saturation: [0.6, 1.2]
      normalize: &id002
      - [123.675, 116.28, 103.53]
      - [58.395, 57.12, 57.375]
    class_names: *id001
    input_size: &id003 [1680, 1050]
    keep_ratio: true
    img_path: ../data/Argoverse-1.1/tracking/train
    ann_path: ../data/Argoverse-HD/annotations/train.json
  val:
    name: ArgoverseDataset
    pipeline:
      normalize: *id002
    class_names: *id001
    input_size: *id003
    keep_ratio: true
    img_path: ../data/Argoverse-1.1/tracking/val
    ann_path: ../data/Argoverse-HD/annotations/val.json
Argoverse-HD x1.5
save_dir: workspace/baseline/argoverse/nanodet-plus-m-1.5x-dgxa100
model:
  arch:
    backbone:
      name: ShuffleNetV2
      model_size: 1.5x
      out_stages: [2, 3, 4]
      activation: LeakyReLU
      channels: 3
    fpn:
      name: GhostPAN
      in_channels: [176, 352, 704]
      out_channels: 128
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: true
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 8
      input_channel: 128
      feat_channels: 128
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
      norm_cfg: {type: BN}
      loss:
        loss_qfl: {name: QualityFocalLoss, use_sigmoid: true, beta: 2.0, loss_weight: 1.0}
        loss_dfl: {name: DistributionFocalLoss, loss_weight: 0.25}
        loss_bbox: {name: GIoULoss, loss_weight: 2.0}
    name: NanoDetPlus
    detach_epoch: 10
    aux_head:
      name: SimpleConvHead
      num_classes: 8
      input_channel: 256
      feat_channels: 256
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
  weight_averager: {name: ExpMovingAverager, decay: 0.9998}
device:
  precision: 16
  gpu_ids: [0]
  workers_per_gpu: 28
  batchsize_per_gpu: 6
schedule:
  optimizer: {name: AdamW, lr: 0.0003, weight_decay: 0.01}
  warmup: {name: linear, steps: 500, ratio: 0.0001}
  total_epochs: 70
  lr_schedule: {name: CosineAnnealingLR, T_max: 300, eta_min: 5.0e-05}
  val_intervals: 10
log: {interval: 50}
test: {}
grad_clip: 35
evaluator: {name: CocoDetectionEvaluator, save_key: mAP}
class_names: &id001 [person, bicycle, car, motorcycle, bus, truck, traffic_light,
  stop_sign]
data:
  train:
    name: ArgoverseDataset
    pipeline:
      perspective: 0
      scale: [0.8, 1.2]
      stretch:
      - [0.95, 1.05]
      - [0.95, 1.05]
      rotation: 0
      shear: 0
      translate: 0.1
      flip: 0
      brightness: 0.2
      contrast: [0.6, 1.4]
      saturation: [0.6, 1.2]
      normalize: &id002
      - [123.675, 116.28, 103.53]
      - [58.395, 57.12, 57.375]
    class_names: *id001
    input_size: &id003 [1680, 1050]
    keep_ratio: true
    img_path: ../data/Argoverse-1.1/tracking/train
    ann_path: ../data/Argoverse-HD/annotations/train.json
  val:
    name: ArgoverseDataset
    pipeline:
      normalize: *id002
    class_names: *id001
    input_size: *id003
    keep_ratio: true
    img_path: ../data/Argoverse-1.1/tracking/val
    ann_path: ../data/Argoverse-HD/annotations/val.json

Is this erroneous behavior? Is there something wrong with my setup?

@RangiLyu
Copy link
Owner

RangiLyu commented Aug 8, 2023

This may be due to the fact that the 1.5x model is not initialized with imagenet pre-training

"shufflenetv2_1.0x": "https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth", # noqa: E501
"shufflenetv2_1.5x": None,

@raember
Copy link
Contributor Author

raember commented Aug 16, 2023

Hi, and thanks for your response. I don't know how I could have overlooked this, but evidently I did.
In the meantime, I tried replicating your metrics on COCO, with this result:

x1.0 x1.5 epochs
COCO2017 26.5% 28.9% 300

image

So I also trained the same models on VISEM for 300 epochs, and although in the end, the x1.5 model has a higher mAP, it's because the x1.0 model drops in performance:

x1.0 x1.5 epochs
VISEM 5.7% 5.9% 300

image

My interpretation is that the shufflenet backbone struggles with the nature of the images, as they are microscopic recordings, which an ImageNet pre-training does not generalize well to.
I was not able to train it on Argoverse-HD for 300 epochs yet, as that requires a substantial amount of time, but as seen in my initial post, it benefits way more from the pre-trained weights.
I'm wondering if it will also see a drop in mAP of the x1.0 model though, just like with VISEM.

Now, after looking into the matter of the missing weights for the backbone, I found the following issues:
Pre-trained shufflenetv2 checkpoints (x1.5 and x2.0) not being supported links to an issue about adding more pre-trained weights, which has a link to a merged PR adding them to the repo. I'll make a PR about this in a bit, but at least this means good news, since this should help mitigate this performance disparity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants