Skip to content
View huanglianghua's full-sized avatar
  • Tongyi Lab

Block or report huanglianghua

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,174 93 Updated Mar 4, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 7,777 796 Updated Mar 7, 2025

PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437

Python 886 44 Updated Feb 25, 2025

Inference script for Oasis 500M

Python 1,755 145 Updated Nov 8, 2024

An open-source lightweight game generation paradigm. It includes everything from data processing to model architecture design and playability-based evaluation methods. The game runs at 20 FPS on a …

Jupyter Notebook 77 2 Updated Jan 7, 2025

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 765 39 Updated Mar 6, 2025

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Jupyter Notebook 21,620 2,265 Updated Jan 15, 2025

Official PyTorch Implementation of "History-Guided Video Diffusion"

Python 214 8 Updated Mar 6, 2025

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 8,510 950 Updated Mar 6, 2025

A toolkit for developing and comparing reinforcement learning algorithms.

Python 35,553 8,656 Updated Oct 11, 2024

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Python 1,422 145 Updated Jun 10, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,201 277 Updated Nov 5, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 77,822 9,324 Updated Jan 4, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,720 618 Updated Mar 6, 2025

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,179 260 Updated Jan 18, 2025

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 19,077 2,033 Updated Oct 15, 2024

This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.

TypeScript 5,133 542 Updated Feb 26, 2025

Lexical is an extensible text editor framework that provides excellent reliability, accessibility and performance.

TypeScript 20,708 1,817 Updated Mar 10, 2025

[ICLR 2025][arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization

Python 138 Updated Jun 12, 2024

Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 993 42 Updated Feb 23, 2025

This repo contains the code for 1D tokenizer and generator

Jupyter Notebook 706 38 Updated Feb 24, 2025

fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.

Go 29,875 3,076 Updated Mar 9, 2025

Minimal Tiptap Editor

TypeScript 1,173 69 Updated Feb 20, 2025

AI agent stdlib that works with any LLM and TypeScript AI SDK.

TypeScript 17,085 2,198 Updated Mar 1, 2025

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,644 491 Updated Mar 7, 2025

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

TypeScript 33,025 2,017 Updated Mar 10, 2025

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

907 49 Updated Apr 24, 2024

Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Llama-3, Langchain, OpenAI, Upstash, Brave & Serper

TypeScript 4,864 768 Updated Sep 28, 2024

Automating the Search for Artificial Life with Foundation Models!

Jupyter Notebook 387 43 Updated Jan 12, 2025

QUDA is a library for performing calculations in lattice QCD on GPUs.

C++ 307 106 Updated Mar 6, 2025
Next