Skip to content
View whatissimondoing's full-sized avatar
  • Fudan University
  • Shanghai, China

Block or report whatissimondoing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
112 results for source starred repositories
Clear filter

Get your documents ready for gen AI

Python 19,586 1,049 Updated Jan 31, 2025

[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

Jupyter Notebook 87 14 Updated Sep 25, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,780 187 Updated Nov 14, 2024

llama-omni训练代码复现

Python 41 6 Updated Jan 23, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,106 270 Updated Nov 5, 2024

[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.

Python 60 1 Updated Nov 10, 2024

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Python 40 3 Updated Jan 20, 2025

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 525 45 Updated Jun 9, 2024

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 822 66 Updated Aug 27, 2024

Writing AI Conference Papers: A Handbook for Beginners

1,849 66 Updated Dec 23, 2024

EMO-SUPERB submission

Python 42 2 Updated Sep 4, 2024
Python 190 20 Updated Jan 31, 2024

心理健康大模型、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2、LLama3.1

Python 1,074 144 Updated Jan 16, 2025

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python 2,185 160 Updated Jan 30, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 8,300 811 Updated Feb 2, 2025

Bilibili Downloader. 一个命令行式哔哩哔哩下载器.

C# 10,444 1,313 Updated Jan 21, 2025
Python 149 13 Updated Jul 9, 2024

A generative speech model for daily dialogue.

Python 34,040 3,688 Updated Jan 25, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,405 640 Updated Jan 23, 2025

SOTA Open Source TTS

Python 18,761 1,419 Updated Jan 26, 2025

[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …

Python 894 43 Updated Jan 31, 2025

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 30,699 2,874 Updated Feb 1, 2025

Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy

Python 992 49 Updated Jan 16, 2025

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 721 55 Updated Dec 23, 2024

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Python 190 8 Updated Jun 17, 2024

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 11,730 1,037 Updated Feb 2, 2025

[Neurips2024] Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

Jupyter Notebook 112 9 Updated Jul 4, 2024

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,794 598 Updated Jul 2, 2024
Next