MT classification models have two problems:
- the weights aren't calculated or created in MinimalDataset used for calculating metrics on the original training data. So, the metrics and predictions fail
- Compare models doesn't let you pull multiple single-task model metrics from models trained on one multi-task dataset with a single response column passed. This is useful for using a MultitaskScaffoldSplit on the dataset and just general ease when training models on dense multitask datasets.