Convert audio effects from your music into encoded representations suitable for audio effects processing and analysis tasks.
We adopt the codebase of CLAP for this project.
An audio effects representation learning based on SimCLR.
pip install fxencoder_plusplus
Notice: The input to Fx-Encoder++ should be stereo
Initialize Models
from fxencoder_plusplus import load_model
# Load default base model (auto-downloads if needed)
DEVICE = 'cuda'
model = load_model(
'default',
device=DEVICE,
)
Extract audio effects representations from mixture tracks or stem tracks, where a single representation encodes the overall audio effects style of the entire input.
import torch
import librosa
audio_path = librosa.example('trumpet')
wav, sr = librosa.load(audio_path, sr=44100, mono=False)
wav = torch.from_numpy(wav).unsqueeze(0).unsqueeze(0).repeat(1, 2, 1).to(DEVICE) # [1, 2, seq_len]
fx_emb = model.get_fx_embedding(wav)
print(fx_emb.shape) # [1, embed_dim], [1, 128]
Extract instrument-specific audio effects representations from mixture tracks. For example, extract the audio effects representation of just the vocals within a full mix.
- Audio Reference:
import torchaudio
import julius
mixture_path = "/path/to/mixture.wav"
mixture, sr = torchaudio.load(mixture_path, num_frames=441000)
mixture = mixture.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]
query_path = "/path/to/inst.wav"
query, sr = torchaudio.load(query_path, frame_offset=441000, num_frames=441000)
query = query.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]
query = julius.resample_frac(query, int(44100), int(48000))
_, fx_emb = model.get_fx_embedding_by_audio_query(mixture, query)
print(fx_emb.shape) # [1, embed_dim], [1, 128]
- Text Reference:
import torchaudio
mixture_path = "/path/to/mixture.wav"
mixture, sr = torchaudio.load(mixture_path, num_frames=441000)
mixture = mixture.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]
query = "the sound of vocals"
_, fx_emb = model.get_fx_embedding_by_text_query(mixture, query)
print(fx_emb.shape) # [1, embed_dim], [1, 128]
- Create environment with conda
conda create --name fxenc python=3.10.14
- Install
pip install -r requirements.txt
Because the dataset has copyright restriction, unfortunatly we cannot directly share preprocessed datasets.
- Download MUSDB, MoisesDB
- Please check FxNorm-automix for preparing audio effects normalized dataset
bash scripts/train_proposed.sh
We develop a retrieval-based evaluation pipeline (Using MUSDB dataset as the example)
- Check FxNorm-automix for preparing audio effects normalized dataset
- Synthesize evaluation dataset: check build_musdb.py
- Run retrieval-based evaluation: check eval_retrieval.py
This library is released under the CC BY-NC 4.0 license. Please refer to the LICENSE file for more details.