Useful for two purposes: - To train a `midi` network that only requires pitch classes - To run an ablation study on separating the pitch class and note letters into their own convolutional blocks