-
@jiaotong University
- Japan
Stars
Janus-Series: Unified Multimodal Understanding and Generation Models
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
PyTorch Implementation of "Monotonic Chunkwise Attention" (ICLR 2018)
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
PyTorch implementation of ``Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation'' [The Visual Computer]
This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.
Source code for "On the Relationship between Self-Attention and Convolutional Layers"
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A family of diffusion models for text-to-audio generation.
Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech
Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech (INTERSPEECH 2022)
📖🎧 A tool for creating ebooks with synchronized text and audio (EPUB3 with Media Overlays)
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"
Displays text in sync with audio being played. Works with VTT files.
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Implementation of the model used in the paper Protest Activity Detection and Perceived Violence Estimation from Social Media Images (ACM Multimedia 2017)