-
Nagoya University
- Nagoya, Japan
- https://unilight.github.io/
- @unilightwf
- https://scholar.google.com/citations?user=g71mJO4AAAAJ
Highlights
- Pro
Stars
sampling frequency independent convolution for MOS prediction
Interpretable anonymizer based on kNN-VC
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Official PyTorch implementation of BigVGAN (ICLR 2023)
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Zero-Shot Foreign Accent Conversion without a Native Reference
[ASRU 2023] Code of paper SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation
A Singing Style Conversion Framework Based On Audio Infilling
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
Unified automatic quality assessment for speech, music, and sound.
Retrieval-Augmented MOS Prediction with Prior Knowledge Integration
Foundational Models for State-of-the-Art Speech and Text Translation
Baseline Recipe for VoicePrivacy Challenge 2024: anonymization systems and evaluation software
Modified transcriptions of YODAS dataset
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Train no-reference speech quality estimators with multiple datasets via learned, per-dataset alignments.