Skip to content

No validation metrics computed (roc_auc, average_precision are None) during training #30

Open
@esignor

Description

@esignor

Description of bug
When training a model into sei-framework (DeepSEA) using train.yml and your data hg38_UCSC.fa, sei_chromatin_profiles.txt, sorted_sei_data.bed.gz, I get printing in the log file:
2025-04-29 15:49:07,977 - INFO - validation roc_auc: None
2025-04-29 15:49:07,995 - INFO - validation average_precision: None
However, I observe calculated validation_loss and training_loss >= 0
This causes, as per Selene's behaviour, that no final best model is saved but only some partial results of the train (data.pkl, version, and a "data" folder in binary format).

I also carried out some experiments by sub-sampling the chromatin profile file and the bed file provided by you (maintaining the presence in the bed file of 5 bins per chromosome and that the chromatin profiles expressed in the .txt file are expressed by bins in the bed file); however I always observed the same behaviour of the model during the train: validation roc_auc None, validation average_precision None, with no final best model detected and saved.

  • Below is the configuration file I used for the train on your data hg38_UCSC.fa, sei_chromatin_profiles.txt, sorted_sei_data.bed.gz (compared to your version train.yml I only changed the number of workers):

...
ops: [train]
model: {
path: /nfsd/bcb/bcbg/EleonoraSignor/sei-framework/model/sei.py,
class: Sei,
class_args: {
sequence_length: 4096,
n_genomic_features: 21907,
},
non_strand_specific: mean
}
sampler: !obj:selene_sdk.samplers.MultiSampler {
train_sampler: !obj:selene_sdk.samplers.dataloader.SamplerDataLoader {
sampler: !obj:selene_sdk.samplers.RandomPositionsSampler {
target_path: /nfsd/bcb/bcbg/EleonoraSignor/sei-framework/train/data/sorted_sei_data.bed.gz,
reference_sequence: !obj:selene_sdk.sequences.Genome {
input_path: /nfsd/bcb/bcbg/EleonoraSignor/sei-framework/resources/hg38_UCSC.fa,
blacklist_regions: hg38
},
features: !obj:selene_sdk.utils.load_features_list {
input_path: /nfsd/bcb/bcbg/EleonoraSignor/sei-framework/train/data/sei_chromatin_profiles.txt
},
test_holdout: [chr8, chr9],
validation_holdout: [chr10],
sequence_length: 4096,
center_bin_to_predict: [2048, 2049],
feature_thresholds: null,
save_datasets: []
},
num_workers: 1,
batch_size: 64,
},
validate_sampler: !obj:selene_sdk.samplers.RandomPositionsSampler {
target_path: /nfsd/bcb/bcbg/EleonoraSignor/sei-framework/train/data/sorted_sei_data.bed.gz,
reference_sequence: !obj:selene_sdk.sequences.Genome {
input_path: /nfsd/bcb/bcbg/EleonoraSignor/sei-framework/resources/hg38_UCSC.fa,
blacklist_regions: hg38
},
features: !obj:selene_sdk.utils.load_features_list {
input_path: /nfsd/bcb/bcbg/EleonoraSignor/sei-framework/train/data/sei_chromatin_profiles.txt
},
test_holdout: [chr8, chr9],
validation_holdout: [chr10],
sequence_length: 4096,
center_bin_to_predict: [2048, 2049],
mode: validate,
save_datasets: []
},
features: !obj:selene_sdk.utils.load_features_list {
input_path: /nfsd/bcb/bcbg/EleonoraSignor/sei-framework/train/data/sei_chromatin_profiles.txt
}
}
train_model: !obj:selene_sdk.TrainModel {
batch_size: 64,
report_stats_every_n_steps: 5000,
n_validation_samples: 12800,
n_test_samples: 1600000,
use_cuda: True,
data_parallel: True, #we recommend multi-GPU training only on NVLink-enabled GPUs
cpu_n_threads: 19,
report_gt_feature_n_positives: 5,
use_scheduler: False,
max_steps: 1000000000,
metrics: {
roc_auc: !import sklearn.metrics.roc_auc_score,
average_precision: !import sklearn.metrics.average_precision_score
},
}
output_dir: /nfsd/bcb/bcbg/EleonoraSignor/sei-framework/train/models
random_seed: 1447
...

  • I also provide you with a log file, which I obtained from an experiment where I used 10 chromatin profiles (SUBSET-sei_chromatin_profiles.txt) and 5 bins per chromosome in the bed file (SUBSET-sorted_sei_data_filtered_bins_exact_profiles.bed.gz), respecting that each of the 10 chromatin profiles has at least one corresponding bins in the bed file, and reference genome hg38_UCSC.fa:

2025-04-29 13:43:46,142 - INFO - Setting deterministic = True for reproducibility.
2025-04-29 13:43:46,142 - INFO - Training parameters set: batch size 8, number of steps per 'epoch': 10, maximum number of steps: 100
2025-04-29 13:43:49,454 - DEBUG - Wrapped model in DataParallel
2025-04-29 13:43:49,455 - DEBUG - Set modules to use CUDA
2025-04-29 13:43:49,457 - INFO - Creating validation dataset.
2025-04-29 13:43:52,973 - INFO - 3.5153815746307373 s to load 1000 validation examples (125 validation batches) to evaluate after each training step.
2025-04-29 13:43:53,042 - DEBUG - [BATCH] Time to sample 8 examples: 0.06795096397399902 s.
2025-04-29 13:43:59,475 - DEBUG - [TRAIN] 0: Saving model state to file.
2025-04-29 13:44:02,013 - DEBUG - [BATCH] Time to sample 8 examples: 0.02274632453918457 s.
2025-04-29 13:44:03,295 - DEBUG - [BATCH] Time to sample 8 examples: 0.040410518646240234 s.
2025-04-29 13:44:04,532 - DEBUG - [BATCH] Time to sample 8 examples: 0.02966022491455078 s.
2025-04-29 13:44:05,770 - DEBUG - [BATCH] Time to sample 8 examples: 0.03045344352722168 s.
2025-04-29 13:44:06,997 - DEBUG - [BATCH] Time to sample 8 examples: 0.023934364318847656 s.
2025-04-29 13:44:08,224 - DEBUG - [BATCH] Time to sample 8 examples: 0.02252674102783203 s.
2025-04-29 13:44:09,461 - DEBUG - [BATCH] Time to sample 8 examples: 0.03370785713195801 s.
2025-04-29 13:44:10,690 - DEBUG - [BATCH] Time to sample 8 examples: 0.023950815200805664 s.
2025-04-29 13:44:11,922 - DEBUG - [BATCH] Time to sample 8 examples: 0.024319171905517578 s.
2025-04-29 13:44:13,155 - DEBUG - [BATCH] Time to sample 8 examples: 0.026258230209350586 s.
2025-04-29 13:44:14,361 - INFO - [STEP 10] average number of steps per second: 0.6
2025-04-29 13:44:14,362 - INFO - training loss: 0.5449446114626798
2025-04-29 13:46:30,425 - INFO - validation roc_auc: None
2025-04-29 13:46:30,426 - INFO - validation average_precision: None
2025-04-29 13:46:30,426 - DEBUG - [TRAIN] 10: Saving model state to file.
2025-04-29 13:46:36,665 - DEBUG - Updating best_model.pth.tar
2025-04-29 13:46:36,665 - INFO - validation loss: 0.4264565827846527
2025-04-29 13:46:36,719 - DEBUG - [BATCH] Time to sample 8 examples: 0.05398201942443848 s.
2025-04-29 13:46:37,989 - DEBUG - [BATCH] Time to sample 8 examples: 0.02141737937927246 s.
2025-04-29 13:46:39,248 - DEBUG - [BATCH] Time to sample 8 examples: 0.03598308563232422 s.
2025-04-29 13:46:40,495 - DEBUG - [BATCH] Time to sample 8 examples: 0.03802609443664551 s.
2025-04-29 13:46:41,758 - DEBUG - [BATCH] Time to sample 8 examples: 0.03577780723571777 s.
2025-04-29 13:46:43,007 - DEBUG - [BATCH] Time to sample 8 examples: 0.028432846069335938 s.
2025-04-29 13:46:44,240 - DEBUG - [BATCH] Time to sample 8 examples: 0.025448083877563477 s.
2025-04-29 13:46:45,500 - DEBUG - [BATCH] Time to sample 8 examples: 0.038803815841674805 s.
2025-04-29 13:46:46,748 - DEBUG - [BATCH] Time to sample 8 examples: 0.04081869125366211 s.
2025-04-29 13:46:48,026 - DEBUG - [BATCH] Time to sample 8 examples: 0.04925346374511719 s.
2025-04-29 13:46:49,241 - INFO - [STEP 20] average number of steps per second: 0.8
2025-04-29 13:46:49,241 - INFO - training loss: 0.238083927705884
2025-04-29 13:49:06,118 - INFO - validation roc_auc: None
2025-04-29 13:49:06,119 - INFO - validation average_precision: None
2025-04-29 13:49:06,120 - DEBUG - [TRAIN] 20: Saving model state to file.
2025-04-29 13:49:11,677 - DEBUG - Updating best_model.pth.tar
2025-04-29 13:49:11,677 - INFO - validation loss: 0.04189149260520935
2025-04-29 13:49:11,711 - DEBUG - [BATCH] Time to sample 8 examples: 0.033258676528930664 s.
2025-04-29 13:49:12,984 - DEBUG - [BATCH] Time to sample 8 examples: 0.030499935150146484 s.
2025-04-29 13:49:14,246 - DEBUG - [BATCH] Time to sample 8 examples: 0.03310084342956543 s.
2025-04-29 13:49:15,509 - DEBUG - [BATCH] Time to sample 8 examples: 0.039209842681884766 s.
2025-04-29 13:49:16,784 - DEBUG - [BATCH] Time to sample 8 examples: 0.053952693939208984 s.
2025-04-29 13:49:18,034 - DEBUG - [BATCH] Time to sample 8 examples: 0.03648209571838379 s.
2025-04-29 13:49:19,293 - DEBUG - [BATCH] Time to sample 8 examples: 0.0339815616607666 s.
2025-04-29 13:49:20,549 - DEBUG - [BATCH] Time to sample 8 examples: 0.03137612342834473 s.
2025-04-29 13:49:21,822 - DEBUG - [BATCH] Time to sample 8 examples: 0.0253298282623291 s.
2025-04-29 13:49:23,097 - DEBUG - [BATCH] Time to sample 8 examples: 0.03963041305541992 s.
2025-04-29 13:49:24,349 - INFO - [STEP 30] average number of steps per second: 0.8
2025-04-29 13:49:24,350 - INFO - training loss: 0.005656527276369161
2025-04-29 13:51:40,170 - INFO - validation roc_auc: None
2025-04-29 13:51:40,171 - INFO - validation average_precision: None
2025-04-29 13:51:40,172 - DEBUG - [TRAIN] 30: Saving model state to file.
2025-04-29 13:51:46,322 - DEBUG - Updating best_model.pth.tar
2025-04-29 13:51:46,323 - INFO - validation loss: 7.712587952846661e-05
2025-04-29 13:51:46,365 - DEBUG - [BATCH] Time to sample 8 examples: 0.042008399963378906 s.
2025-04-29 13:51:47,666 - DEBUG - [BATCH] Time to sample 8 examples: 0.025139808654785156 s.
2025-04-29 13:51:48,942 - DEBUG - [BATCH] Time to sample 8 examples: 0.04321646690368652 s.
2025-04-29 13:51:50,227 - DEBUG - [BATCH] Time to sample 8 examples: 0.03827023506164551 s.
2025-04-29 13:51:51,509 - DEBUG - [BATCH] Time to sample 8 examples: 0.03984642028808594 s.
2025-04-29 13:51:52,800 - DEBUG - [BATCH] Time to sample 8 examples: 0.04982876777648926 s.
2025-04-29 13:51:54,080 - DEBUG - [BATCH] Time to sample 8 examples: 0.04528355598449707 s.
2025-04-29 13:51:55,355 - DEBUG - [BATCH] Time to sample 8 examples: 0.029361724853515625 s.
2025-04-29 13:51:56,617 - DEBUG - [BATCH] Time to sample 8 examples: 0.03145432472229004 s.
2025-04-29 13:51:57,896 - DEBUG - [BATCH] Time to sample 8 examples: 0.03281450271606445 s.
2025-04-29 13:51:59,126 - INFO - [STEP 40] average number of steps per second: 0.8
2025-04-29 13:51:59,127 - INFO - training loss: 5.87201489565814e-06
2025-04-29 13:54:15,643 - INFO - validation roc_auc: None
2025-04-29 13:54:15,660 - INFO - validation average_precision: None
2025-04-29 13:54:15,661 - DEBUG - [TRAIN] 40: Saving model state to file.
2025-04-29 13:54:21,838 - DEBUG - Updating best_model.pth.tar
2025-04-29 13:54:21,838 - INFO - validation loss: 2.4526438555767528e-06
2025-04-29 13:54:21,876 - DEBUG - [BATCH] Time to sample 8 examples: 0.037668466567993164 s.
2025-04-29 13:54:23,178 - DEBUG - [BATCH] Time to sample 8 examples: 0.0375819206237793 s.
2025-04-29 13:54:24,446 - DEBUG - [BATCH] Time to sample 8 examples: 0.04212832450866699 s.
2025-04-29 13:54:25,709 - DEBUG - [BATCH] Time to sample 8 examples: 0.043633460998535156 s.
2025-04-29 13:54:26,974 - DEBUG - [BATCH] Time to sample 8 examples: 0.0326848030090332 s.
2025-04-29 13:54:28,239 - DEBUG - [BATCH] Time to sample 8 examples: 0.03873395919799805 s.
2025-04-29 13:54:29,526 - DEBUG - [BATCH] Time to sample 8 examples: 0.043430328369140625 s.
2025-04-29 13:54:30,794 - DEBUG - [BATCH] Time to sample 8 examples: 0.04188942909240723 s.
2025-04-29 13:54:32,062 - DEBUG - [BATCH] Time to sample 8 examples: 0.034316062927246094 s.
2025-04-29 13:54:33,331 - DEBUG - [BATCH] Time to sample 8 examples: 0.035858154296875 s.
2025-04-29 13:54:34,573 - INFO - [STEP 50] average number of steps per second: 0.8
2025-04-29 13:54:34,574 - INFO - training loss: 2.4952012864787323e-07
2025-04-29 13:56:50,736 - INFO - validation roc_auc: None
2025-04-29 13:56:50,738 - INFO - validation average_precision: None
2025-04-29 13:56:50,738 - DEBUG - [TRAIN] 50: Saving model state to file.
2025-04-29 13:56:55,443 - DEBUG - Updating best_model.pth.tar
2025-04-29 13:56:55,443 - INFO - validation loss: 6.21278103608347e-07
2025-04-29 13:56:55,480 - DEBUG - [BATCH] Time to sample 8 examples: 0.036336660385131836 s.
2025-04-29 13:56:56,783 - DEBUG - [BATCH] Time to sample 8 examples: 0.026140689849853516 s.
2025-04-29 13:56:58,054 - DEBUG - [BATCH] Time to sample 8 examples: 0.02702927589416504 s.
2025-04-29 13:56:59,327 - DEBUG - [BATCH] Time to sample 8 examples: 0.026160240173339844 s.
2025-04-29 13:57:00,596 - DEBUG - [BATCH] Time to sample 8 examples: 0.02651524543762207 s.
2025-04-29 13:57:01,868 - DEBUG - [BATCH] Time to sample 8 examples: 0.028634309768676758 s.
2025-04-29 13:57:03,139 - DEBUG - [BATCH] Time to sample 8 examples: 0.02731490135192871 s.
2025-04-29 13:57:04,414 - DEBUG - [BATCH] Time to sample 8 examples: 0.029621124267578125 s.
2025-04-29 13:57:05,689 - DEBUG - [BATCH] Time to sample 8 examples: 0.0329890251159668 s.
2025-04-29 13:57:06,972 - DEBUG - [BATCH] Time to sample 8 examples: 0.04100394248962402 s.
2025-04-29 13:57:08,216 - INFO - [STEP 60] average number of steps per second: 0.8
2025-04-29 13:57:08,217 - INFO - training loss: 8.381905161058967e-08
2025-04-29 13:59:25,848 - INFO - validation roc_auc: None
2025-04-29 13:59:25,849 - INFO - validation average_precision: None
2025-04-29 13:59:25,849 - DEBUG - [TRAIN] 60: Saving model state to file.
2025-04-29 13:59:30,052 - DEBUG - Updating best_model.pth.tar
2025-04-29 13:59:30,052 - INFO - validation loss: 3.77053402189631e-07
2025-04-29 13:59:30,102 - DEBUG - [BATCH] Time to sample 8 examples: 0.049765586853027344 s.
2025-04-29 13:59:31,400 - DEBUG - [BATCH] Time to sample 8 examples: 0.03403210639953613 s.
2025-04-29 13:59:32,690 - DEBUG - [BATCH] Time to sample 8 examples: 0.045079946517944336 s.
2025-04-29 13:59:33,970 - DEBUG - [BATCH] Time to sample 8 examples: 0.036760807037353516 s.
2025-04-29 13:59:35,245 - DEBUG - [BATCH] Time to sample 8 examples: 0.04293107986450195 s.
2025-04-29 13:59:36,517 - DEBUG - [BATCH] Time to sample 8 examples: 0.030341625213623047 s.
2025-04-29 13:59:37,781 - DEBUG - [BATCH] Time to sample 8 examples: 0.03646421432495117 s.
2025-04-29 13:59:39,034 - DEBUG - [BATCH] Time to sample 8 examples: 0.027781963348388672 s.
2025-04-29 13:59:40,310 - DEBUG - [BATCH] Time to sample 8 examples: 0.028576374053955078 s.
2025-04-29 13:59:41,577 - DEBUG - [BATCH] Time to sample 8 examples: 0.036710262298583984 s.
2025-04-29 13:59:42,824 - INFO - [STEP 70] average number of steps per second: 0.8
2025-04-29 13:59:42,825 - INFO - training loss: 5.364419237707807e-08
2025-04-29 14:01:58,912 - INFO - validation roc_auc: None
2025-04-29 14:01:58,914 - INFO - validation average_precision: None
2025-04-29 14:01:58,915 - DEBUG - [TRAIN] 70: Saving model state to file.
2025-04-29 14:02:04,477 - DEBUG - Updating best_model.pth.tar
2025-04-29 14:02:04,477 - INFO - validation loss: 3.156426239456778e-07
2025-04-29 14:02:04,513 - DEBUG - [BATCH] Time to sample 8 examples: 0.03553056716918945 s.
2025-04-29 14:02:05,803 - DEBUG - [BATCH] Time to sample 8 examples: 0.029157638549804688 s.
2025-04-29 14:02:07,060 - DEBUG - [BATCH] Time to sample 8 examples: 0.03611922264099121 s.
2025-04-29 14:02:08,308 - DEBUG - [BATCH] Time to sample 8 examples: 0.02735614776611328 s.
2025-04-29 14:02:09,577 - DEBUG - [BATCH] Time to sample 8 examples: 0.031862497329711914 s.
2025-04-29 14:02:10,832 - DEBUG - [BATCH] Time to sample 8 examples: 0.033113718032836914 s.
2025-04-29 14:02:12,099 - DEBUG - [BATCH] Time to sample 8 examples: 0.03315114974975586 s.
2025-04-29 14:02:13,357 - DEBUG - [BATCH] Time to sample 8 examples: 0.030545711517333984 s.
2025-04-29 14:02:14,614 - DEBUG - [BATCH] Time to sample 8 examples: 0.03906106948852539 s.
2025-04-29 14:02:15,887 - DEBUG - [BATCH] Time to sample 8 examples: 0.03446364402770996 s.
2025-04-29 14:02:17,112 - INFO - [STEP 80] average number of steps per second: 0.8
2025-04-29 14:02:17,113 - INFO - training loss: 4.5895586708866174e-08
2025-04-29 14:04:32,665 - INFO - validation roc_auc: None
2025-04-29 14:04:32,683 - INFO - validation average_precision: None
2025-04-29 14:04:32,684 - DEBUG - [TRAIN] 80: Saving model state to file.
2025-04-29 14:04:38,517 - DEBUG - Updating best_model.pth.tar
2025-04-29 14:04:38,517 - INFO - validation loss: 2.9552006458288817e-07
2025-04-29 14:04:38,544 - DEBUG - [BATCH] Time to sample 8 examples: 0.02707815170288086 s.
2025-04-29 14:04:39,843 - DEBUG - [BATCH] Time to sample 8 examples: 0.0443272590637207 s.
2025-04-29 14:04:41,098 - DEBUG - [BATCH] Time to sample 8 examples: 0.031562089920043945 s.
2025-04-29 14:04:42,367 - DEBUG - [BATCH] Time to sample 8 examples: 0.03975653648376465 s.
2025-04-29 14:04:43,618 - DEBUG - [BATCH] Time to sample 8 examples: 0.028406381607055664 s.
2025-04-29 14:04:44,870 - DEBUG - [BATCH] Time to sample 8 examples: 0.03281235694885254 s.
2025-04-29 14:04:46,154 - DEBUG - [BATCH] Time to sample 8 examples: 0.03940463066101074 s.
2025-04-29 14:04:47,435 - DEBUG - [BATCH] Time to sample 8 examples: 0.041507720947265625 s.
2025-04-29 14:04:48,701 - DEBUG - [BATCH] Time to sample 8 examples: 0.03107142448425293 s.
2025-04-29 14:04:49,970 - DEBUG - [BATCH] Time to sample 8 examples: 0.03574061393737793 s.
2025-04-29 14:04:51,213 - INFO - [STEP 90] average number of steps per second: 0.8
2025-04-29 14:04:51,214 - INFO - training loss: 4.418195231892241e-08
2025-04-29 14:07:06,887 - INFO - validation roc_auc: None
2025-04-29 14:07:06,888 - INFO - validation average_precision: None
2025-04-29 14:07:06,889 - DEBUG - [TRAIN] 90: Saving model state to file.
2025-04-29 14:07:10,999 - DEBUG - Updating best_model.pth.tar
2025-04-29 14:07:10,999 - INFO - validation loss: 2.8867147921118884e-07
2025-04-29 14:07:11,028 - DEBUG - [BATCH] Time to sample 8 examples: 0.02870011329650879 s.
2025-04-29 14:07:12,342 - DEBUG - [BATCH] Time to sample 8 examples: 0.03154802322387695 s.
2025-04-29 14:07:13,627 - DEBUG - [BATCH] Time to sample 8 examples: 0.03596901893615723 s.
2025-04-29 14:07:14,911 - DEBUG - [BATCH] Time to sample 8 examples: 0.0320584774017334 s.
2025-04-29 14:07:16,221 - DEBUG - [BATCH] Time to sample 8 examples: 0.03128671646118164 s.
2025-04-29 14:07:17,499 - DEBUG - [BATCH] Time to sample 8 examples: 0.03123188018798828 s.
2025-04-29 14:07:18,774 - DEBUG - [BATCH] Time to sample 8 examples: 0.03404068946838379 s.
2025-04-29 14:07:20,052 - DEBUG - [BATCH] Time to sample 8 examples: 0.02485179901123047 s.
2025-04-29 14:07:21,326 - DEBUG - [BATCH] Time to sample 8 examples: 0.028725624084472656 s.

Environment
I observed the same behaviour when installing sei-framework locally and remotely on a cluster.
Locally: Python 3.6.13, PyTorch 1.9.0, Selene 0.5.1
Cluster: Python 3.9.21, PyTorch: 1.13.1, Selene: 0.6.0 (for compatibility with GPUs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions