Add a model loading time for each benchmarks so that we can understand how much time it requires to fit weight into gpus.