-
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters,
arXiv, 2410.23168
, arxiv, pdf, cication: -1Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem, ..., Federico Tombari, Bernt Schiele · (TokenFormer - Haiyang-W)
-
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models,
arXiv, 2410.20771
, arxiv, pdf, cication: -1Julie Kallini, Shikhar Murty, Christopher D. Manning, ..., Christopher Potts, Róbert Csordás
-
Scaling Diffusion Language Models via Adaptation from Autoregressive Models,
arXiv, 2410.17891
, arxiv, pdf, cication: -1Shansan Gong, Shivam Agarwal, Yizhe Zhang, ..., Hao Peng, Lingpeng Kong
· (arxiv) · (DiffuLLaMA - HKUNLP) · (huggingface)