-
Fudan University
- Shanghai, China
Stars
[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
Writing AI Conference Papers: A Handbook for Beginners
心理健康大模型、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2、LLama3.1
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
SGLang is a fast serving framework for large language models and vision language models.
A generative speech model for daily dialogue.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
[Neurips2024] Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
End-to-end stack for WebRTC. SFU media server and SDKs.