Lists (1)
Sort Name ascending (A-Z)
Starred repositories
SALMONN: Speech Audio Language Music Open Neural Network
Wan: Open and Advanced Large-Scale Video Generative Models
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
An open-source lightweight game generation paradigm. It includes everything from data processing to model architecture design and playability-based evaluation methods. The game runs at 20 FPS on a …
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Official PyTorch Implementation of "History-Guided Video Diffusion"
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
A toolkit for developing and comparing reinforcement learning algorithms.
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Robust Speech Recognition via Large-Scale Weak Supervision
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.
Lexical is an extensible text editor framework that provides excellent reliability, accessibility and performance.
[ICLR 2025][arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
This repo contains the code for 1D tokenizer and generator
fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
AI agent stdlib that works with any LLM and TypeScript AI SDK.
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Llama-3, Langchain, OpenAI, Upstash, Brave & Serper
Automating the Search for Artificial Life with Foundation Models!
QUDA is a library for performing calculations in lattice QCD on GPUs.