a897456

a897456

1 follower · 5 following

Stars

geetkhatri / speech-enhancement-psr

Speech enhancement using Wiener filtering and pitch-synchronous STFT phase reconstruction

MATLAB 1 3 Updated Sep 12, 2020

xixi219 / MOS

The MOS system combines components from DNSMOS, NISQA, MOSSSL, and SIGMOS, using the librosa library to process audio waveforms.

Jupyter Notebook 12 4 Updated Feb 16, 2024

hojonathanho / diffusion

Denoising Diffusion Probabilistic Models

Python 3,662 362 Updated Aug 29, 2023

sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation

Python 459 71 Updated Aug 1, 2024

lmnt-com / diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Python 756 111 Updated Mar 26, 2024

descriptinc / descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,126 102 Updated Jul 11, 2024

chazo1994 / Amphion

Forked from open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 1 Updated Jun 24, 2024

SpeechResearch / speechresearch.github.io

HTML 43 5 Updated Jun 10, 2024

tuanad121 / Python-WORLD

Python 149 31 Updated Dec 20, 2023

speechbrain / speechbrain

A PyTorch-based Speech Toolkit

Python 8,595 1,363 Updated Sep 19, 2024

facebookresearch / AudioDec

An Open-source Streaming High-fidelity Neural Audio Codec

Python 410 21 Updated Jun 15, 2024

HeCheng0625 / Amphion

Python 9 3 Updated Jul 19, 2024

HeCheng0625 / AmphionOpen

Forked from open-mmlab/Amphion

Python 2 Updated Apr 1, 2024

open-mmlab / Amphion

Python 4,471 384 Updated Sep 6, 2024

modelscope / FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Python 346 30 Updated Jan 25, 2024

CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 11,510 1,502 Updated Feb 29, 2024

CompVis / stable-diffusion

A latent text-to-image diffusion model

Jupyter Notebook 67,594 10,088 Updated Jun 18, 2024

a897456 / soundsteam_addLDM

1 Updated Mar 18, 2024

archinetai / audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.

Python 1,922 167 Updated Jun 12, 2023

bfloat16 / latent-diffusion-speech

Python 7 1 Updated Feb 19, 2024

heatz123 / naturalspeech

A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)

Python 466 67 Updated Feb 7, 2024

lucidrains / naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Python 1,265 99 Updated Sep 24, 2023

rishikksh20 / iSTFTNet-pytorch

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

Python 221 47 Updated Mar 14, 2023

v-iashin / SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Jupyter Notebook 338 38 Updated Jul 12, 2024

BorealisAI / CP-VAE

On Variational Learning of Controllable Representations for Text without Supervision https://arxiv.org/abs/1905.11975

Roff 27 7 Updated Nov 18, 2020

karchkha / MelSpec_GPT_VQVAE

Audio Generation model working with GPT-2 and VQVAE compressed representation of MelSpectrograms

Python 18 2 Updated Oct 8, 2023

karchkha / MelSpec_VQVAE

VQVAE compression for MelSpectrograms

Python 8 2 Updated Feb 9, 2023

dsriaditya999 / RecogSpeak

Automatic Speaker Recognition (ASR) system using Mel Frequency Cepstral Coefficients (MFCC's) and Vector Quantization (VQ)

MATLAB 1 Updated Dec 30, 2020

gionanide / Speech_Signal_Processing_and_Classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem em…

Python 238 63 Updated Mar 3, 2023

lochenchou / MOSNet

Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Python 325 61 Updated Jul 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly