-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make LR parameters customizable in training workflow #24
Conversation
DNase_intact_8features DNase_intact resources/feature_tables/final_feature_set_DNase_hic.tsv False | ||
model dataset ABC_directory feature_table polynomial override_params | ||
DNase_megamap_noReg DNase_megamap resources/feature_tables/all_features_DNase_hic.tsv False | ||
DNase_megamap_L1 DNase_megamap resources/feature_tables/all_features_DNase_hic.tsv False {'solver': 'saga', 'penalty': 'l1'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this is important for us to recognize what are the best hyperparameters for our logistic regression model. Do we expect users to want to choose their own hyperparameters? Or are we going to find the best ones and make them the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea who is going to use this besides us... the default parameters are already specified in config_training which is more transparent and explicit than having it hardcoded. The override_params can just be used if the user wants to change one of those for any particular model. The whole point of this workflow is to compare different model architectures so I feel like it;s a valid things to include!
""" | ||
|
||
# compare cv-performance on training data across all models (note, this is not the true benchmarking performance CRISPR elements not overlapping prediction elements aren't considered) | ||
rule gather_model_performances: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future, you can consider splitting the actual model training components from the best model selection components into separate snakemake modules to make things cleaner.
|
||
# sort table by AUPRC | ||
df = df.sort_values(by='AUPRC', ascending=False) | ||
df.to_csv(output_file, sep = '\t', index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to share a snippet of the output from your changes in the PR summary/test plan
tested with no regularization, L1, L2, and elastic net settings
@atancoder