Stars
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
[CVPR2024] DisCo: Referring Human Dance Generation in Real World
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Agent that empowers software testing with LLMs; industrial-first in China
RUCAIBox / StructGPT
Forked from JBoRu/StructGPTThe code and data for "StructGPT: A general framework for Large Language Model to Reason on Structured Data"
Official implementation of "DCT-Net: Domain-Calibrated Translation for Portrait Stylization", SIGGRAPH 2022 (TOG); Multi-style cartoonization
BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
Chat with any character you like: ChatGLM2+SadTalker+Voice Cloning | 和喜欢的角色沉浸式对话吧:ChatGLM2+声音克隆+视频对话
本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形,设置面部区域可配置的增强方式进行合成唇形(人脸)区域画面增强,提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧,补充帧间合成唇形的动作过渡,使合成的唇形更为流畅、真实以及自然。
The Hugging Face Course on Transformers for Audio
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.
Text To Video Synthesis Colab
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…
[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.