🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
-
Updated
Nov 14, 2024 - HTML
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Create a Movie animation plus Audio plus Subtitle from a text file
Exploring Bark, the Open-Source Text-to-Audio Generative Model
Create .wav audio samples with text-to-sound generative AI
Add a description, image, and links to the text-to-sound topic page so that developers can more easily learn about it.
To associate your repository with the text-to-sound topic, visit your repo's landing page and select "manage topics."