Skip to content
View whatissimondoing's full-sized avatar
  • Fudan University
  • Shanghai, China

Block or report whatissimondoing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Get your documents ready for gen AI

Python 18,860 999 Updated Jan 21, 2025

[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

Jupyter Notebook 87 14 Updated Sep 25, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,761 186 Updated Nov 14, 2024

llama-omni训练代码复现

Python 40 6 Updated Dec 11, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,099 270 Updated Nov 5, 2024

[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.

Python 56 1 Updated Nov 10, 2024

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Python 40 3 Updated Jan 20, 2025

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 521 45 Updated Jun 9, 2024

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 820 66 Updated Aug 27, 2024

Writing AI Conference Papers: A Handbook for Beginners

1,813 65 Updated Dec 23, 2024

EMO-SUPERB submission

Python 42 2 Updated Sep 4, 2024
Python 190 20 Updated Jan 31, 2024

心理健康大模型、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2、LLama3.1

Python 1,037 141 Updated Jan 16, 2025

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python 1,899 148 Updated Jan 22, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 7,552 729 Updated Jan 22, 2025

Bilibili Downloader. 一个命令行式哔哩哔哩下载器.

C# 10,378 1,308 Updated Jan 21, 2025
Python 149 13 Updated Jul 9, 2024

A generative speech model for daily dialogue.

Python 33,831 3,669 Updated Jan 19, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,325 630 Updated Jan 20, 2025

SOTA Open Source TTS

Python 18,557 1,403 Updated Jan 18, 2025

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 883 43 Updated Dec 28, 2024

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 29,134 2,760 Updated Jan 22, 2025

Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy

Python 982 48 Updated Jan 16, 2025

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 718 55 Updated Dec 23, 2024

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Python 190 8 Updated Jun 17, 2024

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 11,709 1,035 Updated Jan 21, 2025

[Neurips2024] Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

Jupyter Notebook 111 9 Updated Jul 4, 2024

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,787 598 Updated Jul 2, 2024

End-to-end stack for WebRTC. SFU media server and SDKs.

Go 11,262 973 Updated Jan 22, 2025
Next