I’m currently working with the Levenshtein Transformer in fairseq and would like to convert it from a non-autoregressive model to an autoregressive architecture.
Are there any recommended approaches, code references, or prior work related to this modification?
Any guidance or pointers would be greatly appreciated. Tshanks!