talmolab · talmo · Jun 26, 2024 · May 17, 2024 · May 19, 2024 · May 19, 2024
diff --git a/docs/config.md b/docs/config.md
@@ -15,7 +15,7 @@ The config file has four main sections:
 
 - `data_config`: 
     - `provider`: (str) Provider class to read the input sleap files. Only "LabelsReader" supported for the training pipeline.
-    - `pipeline`: (str) Pipeline for training data. One of "TopdownConfmaps", "SingleInstanceConfmaps" or "CentroidConfmapsPipeline".
+    - `pipeline`: (str) Pipeline for training data. One of "TopdownConfmaps", "SingleInstanceConfmaps", "CentroidConfmapsPipeline" or "BottomUp".
     - `train`:
         - `labels_path`: (str) Path to `.slp` files
         - `is_rgb`: (bool) True if the image has 3 channels (RGB image). If input has only one
@@ -32,6 +32,7 @@ The config file has four main sections:
             - `anchor_ind`: (int) Index of the anchor node to use as the anchor point. If None, the midpoint of the bounding box of all visible instance points will be used as the anchor. The bounding box midpoint will also be used if the anchor part is specified but not visible in the instance. Setting a reliable anchor point can significantly improve topdown model accuracy as they benefit from a consistent geometry of the body parts relative to the center of the image.
             - `crop_hw`: (List[int]) Crop height and width of each instance (h, w) for centered-instance model. 
             - `conf_map_gen`: (Dict[float]) Dictionary in the format {"sigma": 1.5, "output_stride": 2}. *sigma* defines the spread of the Gaussian distribution of the confidence maps as a scalar float. Smaller values are more precise but may be difficult to learn as they have a lower density within the image space. Larger values are easier to learn but are less precise with respect to the peak coordinate. This spread is in units of pixels of the model input image, i.e., the image resolution after any input scaling is applied.  *output_stride* defines the stride of the output confidence maps relative to the input image. This is the reciprocal of the resolution, e.g., an output stride of 2 results in confidence maps that are 0.5x the size of the input. Increasing this value can considerably speed up model performance and decrease memory requirements, at the cost of decreased spatial resolution.
+            - `pafs_gen`: (Dict[float]) **Note**: Only for BottomUp model. The structure is same as `preprocessing.conf_map_gen`. 
             - `augmentation_config`:
                 - `random crop`: (Dict[float]) {"random_crop_p": None, "random_crop_hw": None}, where *random_crop_p* is the probability of applying random crop and *random_crop_hw* is the desired output size (out_h, out_w) of the crop. Must be Tuple[int, int], then out_h = size[0], out_w = size[1].
                 - `use_augmentations`: (bool) True if the data augmentation should be applied to the data, else False.
@@ -119,14 +120,15 @@ The config file has four main sections:
             convolutions for upsampling. Interpolation is faster but transposed
             convolutions may be able to learn richer or more complex upsampling to
             recover details from higher scales. Default: True.
-    - `head_configs`
-        - `head_type`: (str) Name of the head. Supported values are 'SingleInstanceConfmapsHead', 'CentroidConfmapsHead', 'CenteredInstanceConfmapsHead', 'MultiInstanceConfmapsHead', 'PartAffinityFieldsHead', 'ClassMapsHead', 'ClassVectorsHead', 'OffsetRefinementHead'
-        - `head_config`:
-            - `part_names`: (List[str]) Text name of the body parts (nodes) that the head will be configured to produce. The number of parts determines the number of channels in the output. If not specified, all body parts in the skeleton will be used.
-            - `anchor_part`: (int) Index of the anchor node to use as the anchor point. If None, the midpoint of the bounding box of all visible instance points will be used as the anchor. The bounding box midpoint will also be used if the anchor part is specified but not visible in the instance. Setting a reliable anchor point can significantly improve topdown model accuracy as they benefit from a consistent geometry of the body parts relative to the center of the image.
-            - `sigma`: (float) Spread of the Gaussian distribution of the confidence maps as a scalar float. Smaller values are more precise but may be difficult to learn as they have a lower density within the image space. Larger values are easier to learn but are less precise with respect to the peak coordinate. This spread is in units of pixels of the model input image, i.e., the image resolution after any input scaling is applied.
-            - `output_stride`: (float) The stride of the output confidence maps relative to the input image. This is the reciprocal of the resolution, e.g., an output stride of 2 results in confidence maps that are 0.5x the size of the input. Increasing this value can considerably speed up model performance and decrease memory requirements, at the cost of decreased spatial resolution.
-            - `loss_weight`: (float) Scalar float used to weigh the loss term for this head during training. Increase this to encourage the optimization to focus on improving this specific output in multi-head models.
+    - `head_configs`: (List[dict]) List of heads in the model. For eg, BottomUp model has both 'MultiInstanceConfmapsHead' and 'PartAffinityFieldsHead' heads.
+            - `head_type`: (str) Name of the head. Supported values are 'SingleInstanceConfmapsHead', 'CentroidConfmapsHead', 'CenteredInstanceConfmapsHead', 'MultiInstanceConfmapsHead', 'PartAffinityFieldsHead', 'ClassMapsHead', 'ClassVectorsHead', 'OffsetRefinementHead'
+            - `head_config`:
+                - `part_names`: (List[str]) Text name of the body parts (nodes) that the head will be configured to produce. The number of parts determines the number of channels in the output. If not specified, all body parts in the skeleton will be used. This config does not apply for 'PartAffinityFieldsHead'.
+                - `edges`: (List[str]) **Note**: Only for 'PartAffinityFieldsHead'. List of indices `(src, dest)` that form an edge. 
+                - `anchor_part`: (int) **Note**: Only for 'CenteredInstanceConfmapsHead'. Index of the anchor node to use as the anchor point. If None, the midpoint of the bounding box of all visible instance points will be used as the anchor. The bounding box midpoint will also be used if the anchor part is specified but not visible in the instance. Setting a reliable anchor point can significantly improve topdown model accuracy as they benefit from a consistent geometry of the body parts relative to the center of the image.
+                - `sigma`: (float) Spread of the Gaussian distribution of the confidence maps as a scalar float. Smaller values are more precise but may be difficult to learn as they have a lower density within the image space. Larger values are easier to learn but are less precise with respect to the peak coordinate. This spread is in units of pixels of the model input image, i.e., the image resolution after any input scaling is applied.
+                - `output_stride`: (float) The stride of the output confidence maps relative to the input image. This is the reciprocal of the resolution, e.g., an output stride of 2 results in confidence maps that are 0.5x the size of the input. Increasing this value can considerably speed up model performance and decrease memory requirements, at the cost of decreased spatial resolution.
+                - `loss_weight`: (float) Scalar float used to weigh the loss term for this head during training. Increase this to encourage the optimization to focus on improving this specific output in multi-head models.
 
 - `trainer_config`: 
     - `train_data_loader`:
@@ -202,7 +204,10 @@ The config file has four main sections:
             - `anchor_ind`: (int) Index of the anchor node to use as the anchor point. If None, the midpoint of the bounding box of all visible instance points will be used as the anchor. The bounding box midpoint will also be used if the anchor part is specified but not visible in the instance. Setting a reliable anchor point can significantly improve topdown model accuracy as they benefit from a consistent geometry of the body parts relative to the center of the image.
             - `crop_hw`: (List[int]) Crop height and width of each instance (h, w) for centered-instance model.
             - `output_stride`: (int) Stride of the output confidence maps relative to the input image. This is the reciprocal of the resolution, e.g., an output stride of 2 results in confidence maps that are 0.5x the size of the input. Increasing this value can considerably speed up model performance and decrease memory requirements, at the cost of decreased spatial resolution.
+            - `pafs_output_stride`: (int) Stride of the output part affinity fields relative to the input image. 
     - `peak_threshold`: `float` between 0 and 1. Minimum confidence threshold. Peaks with values below this will be ignored.
     - `integral_refinement`: If `None`, returns the grid-aligned peaks with no refinement. If `"integral"`, peaks will be refined with integral regression.
     - `integral_patch_size`: Size of patches to crop around each rough peak as an integer scalar.
-    - `return_confmaps`: If `True`, predicted confidence maps will be returned along with the predicted peak values and points. 
+    - `return_confmaps`: If `True`, predicted confidence maps will be returned along with the predicted peak values and points. 
+    - `return_pafs`: If `True`, predicted part affinity fields will be returned along with the predicted peak values and points. 
+    - `return_paf_graph`: If `True`, the part affinity field graph will be returned together with the predicted instances.
diff --git a/docs/config_bottomup.yaml b/docs/config_bottomup.yaml
@@ -0,0 +1,270 @@
+data_config:
+  provider: LabelsReader
+  pipeline: BottomUp
+  train:
+    labels_path: minimal_instance.pkg.slp
+    max_width: null
+    max_height: null
+    scale: 1.0
+    is_rgb: false
+    preprocessing:
+      anchor_ind: 0
+      crop_hw:
+      - 160
+      - 160
+      conf_map_gen:
+        sigma: 1.5
+        output_stride: 2
+      pafs_gen:
+        sigma: 50
+        output_stride: 4
+    augmentation_config:
+      random_crop:
+        random_crop_p: 0
+        random_crop_hw:
+        - 160
+        - 160
+      use_augmentations: true
+      augmentations:
+        intensity:
+          uniform_noise:
+          - 0.0
+          - 0.04
+          uniform_noise_p: 0
+          gaussian_noise_mean: 0.02
+          gaussian_noise_std: 0.004
+          gaussian_noise_p: 0
+          contrast:
+          - 0.5
+          - 2.0
+          contrast_p: 0
+          brightness: 0.0
+          brightness_p: 0
+        geometric:
+          rotation: 180.0
+          scale: 0
+          translate:
+          - 0
+          - 0
+          affine_p: 0.5
+          erase_scale:
+          - 0.0001
+          - 0.01
+          erase_ratio:
+          - 1
+          - 1
+          erase_p: 0
+          mixup_lambda: null
+          mixup_p: 0
+  val:
+    labels_path: minimal_instance.pkg.slp
+    max_width: null
+    max_height: null
+    is_rgb: false
+    scale: 1.0
+    preprocessing:
+      anchor_ind: 0
+      crop_hw:
+      - 160
+      - 160
+      conf_map_gen:
+        sigma: 1.5
+        output_stride: 2
+      pafs_gen:
+        sigma: 50
+        output_stride: 4
+    augmentation_config:
+      random_crop:
+        random_crop_p: 0
+        random_crop_hw:
+        - 160
+        - 160
+      use_augmentations: false
+      augmentations:
+        intensity:
+          uniform_noise:
+          - 0.0
+          - 0.04
+          uniform_noise_p: 0
+          gaussian_noise_mean: 0.02
+          gaussian_noise_std: 0.004
+          gaussian_noise_p: 0
+          contrast:
+          - 0.5
+          - 2.0
+          contrast_p: 0
+          brightness: 0.0
+          brightness_p: 0
+        geometric:
+          rotation: 180.0
+          scale: 0
+          translate:
+          - 0
+          - 0
+          affine_p: 0.5
+          erase_scale:
+          - 0.0001
+          - 0.01
+          erase_ratio:
+          - 1
+          - 1
+          erase_p: 0
+          mixup_lambda: null
+          mixup_p: 0
+model_config:
+  init_weights: xavier
+  pre_trained_weights: null
+  backbone_config:
+    backbone_type: unet
+    backbone_config:
+      in_channels: 1
+      kernel_size: 3
+      filters: 16
+      filters_rate: 2
+      max_stride: 16
+      convs_per_block: 2
+      stacks: 1
+      stem_stride: null
+      middle_block: true
+      up_interpolate: true
+      output_strides:
+      - [2, 4]
+      block_contraction: false
+
+#   pre_trained_weights: ConvNeXt_Tiny_Weights
+#   backbone_config:
+#   backbone_type: convnext
+#   backbone_config:
+#     in_channels: 1
+#     model_type: tiny
+#     arch: 
+#     kernel_size: 3
+#     filters_rate: 2
+#     convs_per_block: 2
+#     up_interpolate: True
+#     output_strides: [2, 4]
+#     stem_patch_kernel: 4
+#     stem_patch_stride: 2
+
+# pre_trained_weights: Swin_T_Weights
+# backbone_config:
+#   backbone_type: swint
+#   backbone_config:
+#     in_channels: 1
+#     model_type: tiny
+#     arch: 
+#     patch_size: [4,4]
+#     window_size: [7,7]
+#     kernel_size: 3
+#     filters_rate: 2
+#     convs_per_block: 2
+#     up_interpolate: True
+#     output_strides: [2, 4]
+#     stem_patch_stride: 2
+
+  head_configs:
+  - head_type: MultiInstanceConfmapsHead
+    head_config:
+      part_names:
+      - '0'
+      - '1'
+      sigma: 1.5
+      output_stride: 2
+      loss_weight: 1.0
+  - head_type: PartAffinityFieldsHead
+    head_config:
+      edges:
+      - - '0'
+        - '1'
+      sigma: 50
+      output_stride: 4
+      loss_weight: 1.0
+trainer_config:
+  train_data_loader:
+    batch_size: 4
+    shuffle: true
+    num_workers: 2
+    pin_memory: true
+    drop_last: false
+  val_data_loader:
+    batch_size: 4
+    shuffle: false
+    num_workers: 2
+    pin_memory: true
+    drop_last: false
+  model_ckpt:
+    save_top_k: 1
+    save_last: true
+    monitor: val_loss
+    mode: min
+    auto_insert_metric_name: false
+  early_stopping:
+    stop_training_on_plateau: true
+    min_delta: 1.0e-08
+    patience: 20
+  device: cpu
+  trainer_devices: 1
+  trainer_accelerator: cpu
+  enable_progress_bar: false
+  steps_per_epoch: null
+  max_epochs: 50
+  seed: 1000
+  use_wandb: false
+  save_ckpt: true
+  save_ckpt_path: min_inst_bottomup1
+  wandb:
+    entity: team-ucsd
+    project: test_centroid_centered
+    name: fly_unet_centered
+    wandb_mode: ''
+    api_key: ''
+    log_params:
+    - trainer_config.optimizer_name
+    - trainer_config.optimizer.amsgrad
+    - trainer_config.optimizer.lr
+    - model_config.backbone_config.backbone_type
+    - model_config.init_weights
+  optimizer_name: Adam
+  optimizer:
+    lr: 0.0001
+    amsgrad: false
+  lr_scheduler:
+    threshold: 1.0e-07
+    cooldown: 3
+    patience: 5
+    factor: 0.5
+    min_lr: 1.0e-08
+inference_config:
+  device: cpu
+  data:
+    path: ./tests/assets/minimal_instance.pkg.slp
+    max_instances: 6
+    max_width: null
+    max_height: null
+    is_rgb: false
+    scale: 1.0
+    provider: LabelsReader
+    data_loader:
+      batch_size: 4
+      shuffle: false
+      num_workers: 2
+      pin_memory: true
+      drop_last: false
+    video_loader:
+      batch_size: 4
+      queue_maxsize: 8
+      start_idx: 0
+      end_idx: 100
+    preprocessing:
+      anchor_ind: 0
+      crop_hw:
+      - 160
+      - 160
+      output_stride: 2
+      pafs_output_stride: 4
+  peak_threshold: 0.3
+  integral_refinement: integral
+  integral_patch_size: 5
+  return_confmaps: false
+  return_pafs: false
+  return_paf_graph: false
diff --git a/docs/config_centroid.yaml b/docs/config_centroid.yaml
@@ -5,7 +5,7 @@ data_config:
     labels_path: "minimal_instance.pkg.slp"
     max_width: 
     max_height: 
-    scale: 1.0
+    scale: 0.5
     is_rgb: False
     preprocessing:
       anchor_ind: 0
@@ -58,7 +58,7 @@ data_config:
     max_width: 
     max_height: 
     is_rgb: False
-    scale: 1.0
+    scale: 0.5
     preprocessing:
       anchor_ind: 0
       crop_hw:
@@ -226,7 +226,7 @@ inference_config:
     max_width: 
     max_height: 
     is_rgb: False
-    scale: 1.0
+    scale: 0.5
     provider: LabelsReader
     data_loader:
       batch_size: 4
@@ -245,7 +245,7 @@ inference_config:
       - 160
       - 160
       output_stride: 2
-  peak_threshold: 0.0
+  peak_threshold: 0.5
   integral_refinement: integral
   integral_patch_size: 5
   return_confmaps: false
diff --git a/sleap_nn/architectures/model.py b/sleap_nn/architectures/model.py
@@ -132,6 +132,8 @@ def __init__(
             head = get_head(head_config.head_type, head_config.head_config)
             self.heads.append(head)
 
+        min_output_stride = min(backbone_config.backbone_config.output_strides)
+        strides = self.backbone.dec.current_strides
         self.head_layers = nn.ModuleList([])
         for head in self.heads:
             in_channels = int(
@@ -141,6 +143,13 @@ def __init__(
                     ** len(self.backbone.dec.decoder_stack)
                 )
             )
+            if head.output_stride != min_output_stride:
+                factor = strides.index(min_output_stride) - strides.index(
+                    head.output_stride
+                )
+                in_channels = in_channels * (
+                    self.backbone_config.backbone_config.filters_rate**factor
+                )
             self.head_layers.append(head.make_head(x_in=int(in_channels)))
 
     @classmethod