Stars
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Voice activity detector (VAD) for the browser with a simple API
A Chrome extension for focus and productivity with Pomodoro Timer and website blocking | 专注效率提升的 Chrome 插件,集成番茄钟和网站屏蔽功能
Real-time voice assistant with multi-speaker recognition & tactical suggestions. Local AI processing for privacy-sensitive scenarios (debates/meetings/negotiations).
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
基于多模态大模型的智能搜索助手,通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.
Multilingual Voice Understanding Model
An extensive node suite that enables ComfyUI to process 3D inputs (Mesh & UV Texture, etc) using cutting edge algorithms (3DGS, NeRF, etc.)
[CVPR 2024 Highlight] The official repo for "GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians"
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Multimodal Real-time Audio-Video Chatting Intelligent Assistant
A modular high-level library to train embodied AI agents across a variety of tasks and environments.
Python sample codes and textbook for robotics algorithms.
Code and dataset for photorealistic Codec Avatars driven from audio
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
基于Bert-VITS2做的表情、动画测试. Animation testing based on Bert-VITS2.
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
vits2 backbone with multilingual-bert
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
The official implementation of the paper "Human Motion Diffusion as a Generative Prior"
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
An Open-Ended Embodied Agent with Large Language Models
ImageBind One Embedding Space to Bind Them All
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Langflow is a powerful tool for building and deploying AI-powered agents and workflows.