GitHub - jfwu777/PredGeneExpr: Predicting gene expression using millions of random promoter sequences

Predicting gene expression using millions of random promoter sequences

Generate dataset EDA_prep
A naive LSTM model is tested on a random 4:1 train/val split on full dataset
1. model -> simple LSTM with simple embedding dimension (6, 100) for A,G,C,T,N(unknown) + PAD, where PAD is a PAD placeholder to pad all sequence to 150(check EDA_prep for detail)
2. batch_size = 512
Performance is documented in log_naive_lstm.txt
- Pearson's R = 0.73
- Spearman's R = 0.75

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
EDA_prep.ipynb		EDA_prep.ipynb
README.md		README.md
log_naive_lstm.txt		log_naive_lstm.txt
naive_lstm.py		naive_lstm.py