RTDT-3331_improve_regression_model_training #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

veronika12shev wants to merge 5 commits into main from RTDT-3331_improve_regression_model_training

veronika12shev commented Feb 24, 2025 •

edited

Loading

The regression training is based on the classification, with added last layer. Here is also another class for dataset, that expects three folders, train, val and test, and annotations.json. Command to run is:

python -m torch.distributed.run /data/vision/references/classification/train.py --model resnet18 --batch-size 32 --lr 0.01 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 500 --random-erase 0.1 --weight-decay 0.00002 --norm-weight-decay 0.0 --train-crop-size 220 --val-resize-size 232 --ra-sampler --ra-reps 4 --data /data/bruggen_regression --annotations_file /data/bruggen_regression/annotations.json --output-dir /data/bruggen_regression/regression

resnet50 or resnet101 also can be used here, pathes (--data and --annotations_file) should be adjusted

Here is also added tensorboard, command tensorboard --logdir=/data/bruggen_regression/ (adjust path)

veronika12shev added 2 commits

February 24, 2025 10:41


          changed training for regression

2e81255


          add dataset for regression

cd31d38

rostyslavhereha reviewed

View reviewed changes

references/classification/regression_train.py Outdated

+              # def load_data(train_dir, val_dir, args):
+              #     # Define transforms
+              #     train_transforms = torchvision.transforms.Compose([
+              #         torchvision.transforms.RandomResizedCrop(224),

rostyslavhereha Feb 24, 2025

antialiasing=False

references/classification/regression_train.py Outdated

+              #     ])
+              #
+              #     val_transforms = torchvision.transforms.Compose([
+              #         torchvision.transforms.Resize(256),

rostyslavhereha Feb 24, 2025

antialiasing=False

references/classification/regression_train.py Outdated


		scaled_loaded_loss = loaded_loss * target[:, 1]

		penalties = torch.zeros_like(loaded_loss)

rostyslavhereha Feb 24, 2025

why do we need a penalty?

references/classification/regression_train.py Outdated

+                          self.annotations = json.load(f)
+                      self.image_files = [f for f in os.listdir(self.root_dir)
+                                          if f.lower().endswith(('.png', '.jpg', '.jpeg'))

rostyslavhereha Feb 24, 2025

you can use IMG_EXTENSIONS

references/classification/regression_train.py Outdated


		def forward(self, pred, target):

		loaded_loss = (pred[:, 0] - target[:, 0]) ** 2

rostyslavhereha Feb 24, 2025

If you are inheriting from nn.Module instead of Function, you can calculate internal losses like this::
loaded_loss = nn.MSELoss()(output[:, 0], target[:, 0])

references/classification/regression_train.py

+                      with torch.cuda.amp.autocast(enabled=scaler is not None):
+                          output = model(image)  # output shape: [batch_size, 2]
+                          loss = criterion(output, target)
+                          loaded_loss = nn.MSELoss()(output[:, 0], target[:, 0])

rostyslavhereha Feb 24, 2025

why do we need to calculate and save this loss during training?

references/classification/regression_train.py

+                          for pred, true in zip(output.cpu().numpy(), target.cpu().numpy()):
+                              if threshold_accuracy(true, pred, threshold):
+                                  correct += 1

rostyslavhereha Feb 24, 2025

In general, the metrics can be calculated for two outputs simultaneously, since we do not use these parameters separately anywhere, but the idea is good, but I would still add total loss, mae, r2

references/classification/regression_train.py


		total_loss = scaled_loaded_loss.mean() + confidence_loss.mean() + penalty_term
		return total_loss

rostyslavhereha Feb 24, 2025

Why don't you use the already written backwards method? During our research, we found that the custom backwards loss reduces losses by ~20% less than with the usual autograd.

veronika12shev added 3 commits

February 25, 2025 14:31


          custom loss from confidence runner

c020960


          add tensorboard (loss and accuracy)

447135e


          new last layer

b171bc8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet