whatissimondoing

Simon Lee whatissimondoing

4 followers · 0 following

Fudan University
Shanghai, China

Achievements

Stars

DS4SD / docling

Get your documents ready for gen AI

Python 18,860 999 Updated Jan 21, 2025

CAS-SIAT-XinHai / CPsyCoun

[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

Jupyter Notebook 87 14 Updated Sep 25, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,761 186 Updated Nov 14, 2024

wntg / LLaMA-Omni

llama-omni训练代码复现

Python 40 6 Updated Dec 11, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,099 270 Updated Nov 5, 2024

luka-group / mDPO

[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.

Python 56 1 Updated Nov 10, 2024

Labbeti / aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Python 40 3 Updated Jan 20, 2025

ZhangXInFD / SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 521 45 Updated Jun 9, 2024

OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 820 66 Updated Aug 27, 2024

hzwer / WritingAIPaper

Writing AI Conference Papers: A Handbook for Beginners

1,813 65 Updated Dec 23, 2024

EMOsuperb / EMO-SUPERB-submission

EMO-SUPERB submission

Python 42 2 Updated Sep 4, 2024

HIT-SCIR-SC / QiaoBan

Python 190 20 Updated Jan 31, 2024

SmartFlowAI / EmoLLM

心理健康大模型、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2、LLama3.1

Python 1,037 141 Updated Jan 16, 2025

argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python 1,899 148 Updated Jan 22, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 7,552 729 Updated Jan 22, 2025

nilaoda / BBDown

Bilibili Downloader. 一个命令行式哔哩哔哩下载器.

C# 10,378 1,308 Updated Jan 21, 2025

thuhcsi / SECap

Python 149 13 Updated Jul 9, 2024

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 33,831 3,669 Updated Jan 19, 2025

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,325 630 Updated Jan 20, 2025

fishaudio / fish-speech

SOTA Open Source TTS

Python 18,557 1,403 Updated Jan 18, 2025

microsoft / MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 883 43 Updated Dec 28, 2024

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 29,134 2,760 Updated Jan 22, 2025

xhluca / bm25s

Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy

Python 982 48 Updated Jan 16, 2025

ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 718 55 Updated Dec 23, 2024

emo-box / EmoBox

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Python 190 8 Updated Jun 17, 2024

ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 11,709 1,035 Updated Jan 21, 2025

Hannibal046 / xRAG

[Neurips2024] Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

Jupyter Notebook 111 9 Updated Jul 4, 2024

chenfei-wu / TaskMatrix

Python 34,539 3,307 Updated Jan 6, 2024

Zejun-Yang / AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,787 598 Updated Jul 2, 2024

livekit / livekit

End-to-end stack for WebRTC. SFU media server and SDKs.

Go 11,262 973 Updated Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simon Lee whatissimondoing

Achievements

Achievements

Block or report whatissimondoing

Stars

DS4SD / docling

CAS-SIAT-XinHai / CPsyCoun

ictnlp / LLaMA-Omni

wntg / LLaMA-Omni

gpt-omni / mini-omni

luka-group / mDPO

Labbeti / aac-metrics

ZhangXInFD / SpeechTokenizer

OpenMOSS / AnyGPT

hzwer / WritingAIPaper

EMOsuperb / EMO-SUPERB-submission

HIT-SCIR-SC / QiaoBan

SmartFlowAI / EmoLLM

argilla-io / distilabel

sgl-project / sglang

nilaoda / BBDown

thuhcsi / SECap

2noise / ChatTTS

open-mmlab / Amphion

fishaudio / fish-speech

microsoft / MInference

infiniflow / ragflow

xhluca / bm25s

ddlBoJack / emotion2vec

emo-box / EmoBox

ShishirPatil / gorilla

Hannibal046 / xRAG

chenfei-wu / TaskMatrix

Zejun-Yang / AniPortrait

livekit / livekit