Skip to content
View jefflai108's full-sized avatar
🍄
venture to a bigger world
🍄
venture to a bigger world

Block or report jefflai108

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 2,941 488 Updated May 22, 2026

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,814 4,029 Updated Jun 6, 2026

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,485 772 Updated May 21, 2026

An Extensible Deep Learning Library

Python 2,363 406 Updated May 16, 2026

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,789 1,174 Updated Apr 8, 2026

PyTorch native post-training library

Python 5,768 726 Updated Jun 6, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 82,094 17,731 Updated Jun 7, 2026

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,629 321 Updated Jun 4, 2026

Lightweight coding agent that runs in your terminal

Rust 89,201 13,132 Updated Jun 7, 2026

Open-source unified multimodal model

Python 5,991 528 Updated May 4, 2026

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 10,341 966 Updated May 16, 2026

Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigating Attention Sinks and Massive Activations in Audio-Visual …

Python 62 7 Updated Jan 18, 2026

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 217 18 Updated Sep 19, 2024

Generative models for conditional audio generation

Python 3,764 467 Updated May 26, 2026

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

440 24 Updated Mar 8, 2025

ConMamba for Automatic Speech Recognition

Python 105 9 Updated Aug 12, 2024

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,930 123 Updated Feb 20, 2026

SALMONN family: A suite of advanced multi-modal LLMs

1,443 115 Updated May 26, 2026

Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Jupyter Notebook 88 14 Updated Jun 12, 2024

Python module for syllabifying English ARPABET transcriptions

Python 73 17 Updated Feb 15, 2019

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,838 815 Updated Mar 25, 2026

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 194 13 Updated Jul 12, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 659 67 Updated Jun 9, 2024

Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.

Python 788 56 Updated May 10, 2022

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch

Python 1,803 280 Updated Feb 15, 2023

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,921 382 Updated Mar 14, 2024

Code for "Phoneme Segmentation Using Self-Supervised Speech Models", Strgar & Harwath, Proceedings of the IEEE Spoken Language Technology Workshop (SLT) 2023

Python 55 10 Updated Nov 4, 2022

multilingual speech aligner

Python 78 6 Updated Nov 19, 2023

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".

Python 57 3 Updated Apr 20, 2023

Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Jupyter Notebook 39 8 Updated May 5, 2026
Next