Currently, the only way to use dtw is by disabling flash attention for the encoder and decoder. However, we only really need to disable it in the decoder for dtw to work.