Skip to content

Conversation

@avantikalal
Copy link
Collaborator

  1. Enabled finetuning via CLI
  2. Added new arguments to CLI training and finetuning scripts
  3. Modified the CLI inference script to enable running separately on train/val/test genes
  4. Enabled inference by changing predict_on_dataset to have separate logic depending on the dataset class (HDF5Dataset or VariantDataset)
  5. Added a new preprocessing function that searches for gene metadata on ensembl using mygene.

@avantikalal avantikalal requested a review from MuhammedHasan July 9, 2025 17:28
@MuhammedHasan
Copy link
Collaborator

MuhammedHasan commented Jul 9, 2025

@avantikalal Can you added finetuning data similar to https://github.com/Genentech/decima/blob/main/tests/test_cli.py with may be dumy test and just runs new steps? It is easier to add unittest for GenePearsonCorrCoef similar to https://github.com/Genentech/decima/blob/main/tests/test_metrics.py.

I am preparing unittests for cli_predict_gene class and related functions.

@avantikalal
Copy link
Collaborator Author

@avantikalal Can you added finetuning data similar to https://github.com/Genentech/decima/blob/main/tests/test_cli.py with may be dumy test and just runs new steps? It is easier to add unittest for GenePearsonCorrCoef similar to https://github.com/Genentech/decima/blob/main/tests/test_metrics.py.

I am preparing unittests for cli_predict_gene class and related functions.

Shall I do these in a separate PR?

@MuhammedHasan MuhammedHasan changed the base branch from main to 0_2_2 July 13, 2025 03:49
@MuhammedHasan MuhammedHasan self-assigned this Jul 13, 2025
@MuhammedHasan MuhammedHasan merged commit b0f1c18 into 0_2_2 Jul 21, 2025
MuhammedHasan added a commit that referenced this pull request Jul 21, 2025
…nd fine-tuning (#19)

* ensemble vep init

* backward compability of grelu, ensembling, testcases, custom fasta

* gene dataset

* gene expression prediction and sequence shifting

* fix testcase

* conflig

* Changes related to data processing and fine-tuning new models (#16)

* enable finetune via cli

* split input and output directories

* add mygene

* added ensembl

* added N padding

* add more params

* added args to cli finetune

* add csv logging

* add csv logging

* add run name to checkpoints

* gene pearson metric

* training 202506

* added new params

* added topk

* reset unnecessary changes

* reset unnecessary changes

* reset unnecessary changes

* reset unnecessary changes

* reset unnecessary changes

* fixed savek typo

* more useful print

* finetuning updates

---------

Co-authored-by: Muhammed Hasan Celik <celik.muhammed_hasan@gene.com>

* fix testcases

* branch review updates

---------

Co-authored-by: Muhammed Hasan Celik <celik.muhammed_hasan@gene.com>
Co-authored-by: Avantika Lal <avantikalal1990@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants