Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigating Attention Sinks and Massive Activations in Audio-Visual …

Python 62 7 Updated Jan 18, 2026

Aria-K-Alethia / BigCodec

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 217 18 Updated Sep 19, 2024

Stability-AI / stable-audio-tools

Generative models for conditional audio generation

Python 3,764 467 Updated May 26, 2026

facebookresearch / MovieGenBench

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

440 24 Updated Mar 8, 2025

xi-j / Mamba-ASR

ConMamba for Automatic Speech Recognition

Python 105 9 Updated Aug 12, 2024

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,930 123 Updated Feb 20, 2026

bytedance / SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

1,443 115 Updated May 26, 2026

mhamilton723 / DenseAV

Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Jupyter Notebook 88 14 Updated Jun 12, 2024

kylebgorman / syllabify

Python module for syllabifying English ARPABET transcriptions

Python 73 17 Updated Feb 15, 2019

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,838 815 Updated Mar 25, 2026

mct10 / RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 194 13 Updated Jul 12, 2024

ZhangXInFD / SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 659 67 Updated Jun 9, 2024

NVlabs / GroupViT

Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.

Python 788 56 Updated May 10, 2022

rosinality / vq-vae-2-pytorch

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch

Python 1,803 280 Updated Feb 15, 2023

OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,921 382 Updated Mar 14, 2024

lstrgar / ss-phoneme-seg

Code for "Phoneme Segmentation Using Self-Supervised Speech Models", Strgar & Harwath, Proceedings of the IEEE Spoken Language Technology Workshop (SLT) 2023

Python 55 10 Updated Nov 4, 2022

xinjli / alqalign

multilingual speech aligner

Python 78 6 Updated Nov 19, 2023

YuanGongND / uavm

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".

Python 57 3 Updated Apr 20, 2023

kamperh / vqwordseg

Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Jupyter Notebook 39 8 Updated May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cheng-I Jeff Lai jefflai108

Achievements

Achievements

Block or report jefflai108

Stars

MeiGen-AI / MultiTalk

verl-project / verl

facebookresearch / xformers

apple / axlearn

facebookresearch / seamless_communication

meta-pytorch / torchtune

vllm-project / vllm

xdit-project / xDiT

openai / codex

ByteDance-Seed / Bagel

kyutai-labs / moshi

umbertocappellazzo / Llama-AVSR