-
@jiaotong University
- Japan
Stars
A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
PyTorch Implementation of "Monotonic Chunkwise Attention" (ICLR 2018)
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
PyTorch implementation of ``Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation'' [The Visual Computer]
This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.
Source code for "On the Relationship between Self-Attention and Convolutional Layers"
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A family of diffusion models for text-to-audio generation.
Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech
Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech (INTERSPEECH 2022)
noetits / ICE-Talk
Forked from CSTR-Edinburgh/opheliaInterface for Controllable Expressive Talking Machine
📖🎧 A tool for creating ebooks with synchronized text and audio (EPUB3 with Media Overlays)
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"
Displays text in sync with audio being played. Works with VTT files.
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Implementation of the model used in the paper Protest Activity Detection and Perceived Violence Estimation from Social Media Images (ACM Multimedia 2017)
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…