device = torch.device("cuda") s_t = torch.tensor(metric_today[:], dtype=torch.float32).reshape(-1, 1).to(device) # length == 1440 s_y = torch.tensor(metric_yesterday[:], dtype=torch.float32).reshape(-1, 1).to(device) s_l = torch.tensor(metric_lastweek[:], dtype=torch.float32).reshape(-1, 1).to(device) t1 = time.time() - start_t slice_yesterday = dtw(s_t, s_y, be="pytorch") slice_week = dtw(s_t, s_l, be="pytorch") slice = min(slice_yesterday, slice_week) t2 = time.time() - t1 tslearn.metrics.dtw use 20s, t2 is much larger than t1 from dtaidistance import dtw slice_yesterday = dtw.distance(metric_today[start:], metric_yesterday[start:]) slice_week = dtw.distance(metric_today[start:], metric_lastweek[start:]) slice = min(slice_yesterday, slice_week) dtaidistance.dtw only use 0.68s why? Shouldn't GPU computing be faster?