competition website: https://www.synapse.org/#!Synapse:syn28469146/wiki/
- Generate dataset EDA_prep
- A naive LSTM model is tested on a random 4:1 train/val split on full dataset
- model -> simple LSTM with simple embedding dimension (6, 100) for A,G,C,T,N(unknown) + PAD, where PAD is a PAD placeholder to pad all sequence to 150(check EDA_prep for detail)
- batch_size = 512
- Performance is documented in log_naive_lstm.txt
- Pearson's R = 0.73
- Spearman's R = 0.75