Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make LR parameters customizable in training workflow #24

Merged
merged 3 commits into from
Feb 9, 2024

Conversation

mayasheth
Copy link
Contributor

tested with no regularization, L1, L2, and elastic net settings

@atancoder

@mayasheth mayasheth requested a review from atancoder January 30, 2024 03:55
DNase_intact_8features DNase_intact resources/feature_tables/final_feature_set_DNase_hic.tsv False
model dataset ABC_directory feature_table polynomial override_params
DNase_megamap_noReg DNase_megamap resources/feature_tables/all_features_DNase_hic.tsv False
DNase_megamap_L1 DNase_megamap resources/feature_tables/all_features_DNase_hic.tsv False {'solver': 'saga', 'penalty': 'l1'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is important for us to recognize what are the best hyperparameters for our logistic regression model. Do we expect users to want to choose their own hyperparameters? Or are we going to find the best ones and make them the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea who is going to use this besides us... the default parameters are already specified in config_training which is more transparent and explicit than having it hardcoded. The override_params can just be used if the user wants to change one of those for any particular model. The whole point of this workflow is to compare different model architectures so I feel like it;s a valid things to include!

"""

# compare cv-performance on training data across all models (note, this is not the true benchmarking performance CRISPR elements not overlapping prediction elements aren't considered)
rule gather_model_performances:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future, you can consider splitting the actual model training components from the best model selection components into separate snakemake modules to make things cleaner.


# sort table by AUPRC
df = df.sort_values(by='AUPRC', ascending=False)
df.to_csv(output_file, sep = '\t', index=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to share a snippet of the output from your changes in the PR summary/test plan

@mayasheth mayasheth merged commit 7fa1fd0 into main Feb 9, 2024
@mayasheth mayasheth deleted the LR_parameters branch February 9, 2024 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants