huanglianghua

Lianghua Huang huanglianghua

Researcher at Tongyi Lab.

363 followers · 177 following

Tongyi Lab

Achievements

Lists (1)

Sort

sora

3 repositories

Starred repositories

bytedance / SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,174 93 Updated Mar 4, 2025

Wan-Video / Wan2.1

Wan: Open and Advanced Large-Scale Video Generative Models

Python 7,777 796 Updated Mar 7, 2025

LTH14 / fractalgen

PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437

Python 886 44 Updated Feb 25, 2025

etched-ai / open-oasis

Inference script for Oasis 500M

Python 1,755 145 Updated Nov 8, 2024

GreatX3 / Playable-Game-Generation

An open-source lightweight game generation paradigm. It includes everything from data processing to model architecture design and playability-based evaluation methods. The game runs at 20 FPS on a …

Jupyter Notebook 77 2 Updated Jan 7, 2025

buoyancy99 / diffusion-forcing

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 765 39 Updated Mar 6, 2025

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Jupyter Notebook 21,620 2,265 Updated Jan 15, 2025

kwsong0113 / diffusion-forcing-transformer

Official PyTorch Implementation of "History-Guided Video Diffusion"

Python 214 8 Updated Mar 6, 2025

Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 8,510 950 Updated Mar 6, 2025

openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.

Python 35,553 8,656 Updated Oct 11, 2024

openai / Video-Pre-Training

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Python 1,422 145 Updated Jun 10, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,201 277 Updated Nov 5, 2024

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 77,822 9,324 Updated Jan 4, 2025

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,720 618 Updated Mar 6, 2025

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,179 260 Updated Jan 18, 2025

openai / swarm

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 19,077 2,033 Updated Oct 15, 2024

openai / openai-realtime-agents

This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.

TypeScript 5,133 542 Updated Feb 26, 2025

facebook / lexical

Lexical is an extensible text editor framework that provides excellent reliability, accessibility and performance.

TypeScript 20,708 1,817 Updated Mar 10, 2025

zhaoyue-zephyrus / bsq-vit

[ICLR 2025][arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization

Python 138 Updated Jun 12, 2024

FoundationVision / Infinity

Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 993 42 Updated Feb 23, 2025

bytedance / 1d-tokenizer

This repo contains the code for 1D tokenizer and generator

Jupyter Notebook 706 38 Updated Feb 24, 2025

danielmiessler / fabric

fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.

Go 29,875 3,076 Updated Mar 9, 2025

Aslam97 / shadcn-minimal-tiptap

Minimal Tiptap Editor

TypeScript 1,173 69 Updated Feb 20, 2025

transitive-bullshit / agentic

AI agent stdlib that works with any LLM and TypeScript AI SDK.

TypeScript 17,085 2,198 Updated Mar 1, 2025

NVIDIA / Cosmos

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,644 491 Updated Mar 7, 2025