Hi, the paper report a wall-time of 6 weeks for training the biggest model. Do you have estimates for the training of the smaller models? Best Leonhard