MEGAMI: Multitrack Embedding Generative Auto Mixing

Automatic Music Mixing Using a Generative Model of Effect Embeddings

Overview

MEGAMI (Multitrack Embedding Generative Auto MIxing) is a research framework for automatic music mixing based on a generative model of effect embeddings.
Unlike conventional regression-based methods that predict a single deterministic mix, MEGAMI employs conditional diffusion models to capture the diversity of professional mixing decisions, modeling the inherently one-to-many nature of the task.

The framework operates in an effect embedding space rather than directly in the audio domain, enabling realistic and flexible mixing without altering musical content.
Through domain adaptation in the CLAP embedding space, MEGAMI can train on both dry (unprocessed) and wet (professionally mixed) recordings. A permutation-equivariant transformer architecture allows operation on an arbitrary number of unlabeled tracks, and evaluations show performance approaching human-level quality across diverse musical genres.

🧩 Key Components

Component	Description
FxGenerator	Generates per-track effect embeddings conditioned on raw track features.
FxProcessor	Neural effects processor applying the generated embeddings to audio.
CLAP Domain Adaptor	Adapts representations between dry and wet domains using CLAP embeddings.

🧱 Repository Structure


├── train_CLAPDomainAdaptor.py          # Train domain adaptation model
├── train_FxGenerator.py                # Train generative embedding model
├── train_FxProcessor.py                # Train neural effects processor
│
├── train_CLAPDomainAdaptor_public.sh   # Example public training scripts
├── train_FxGenerator_public.sh
├── train_FxProcessor_public.sh


├── inference/                          # Inference and validation modules
│   ├── inference.py
│   ├── sampler_euler_heun_multitrack.py
│
├── datasets/                           # Dataset loaders
│   ├── MoisesDB_MedleyDB_multitrack.py
│   ├── public_multidataset_singletrack.py
│   └── eval_benchmark.py
│
├── networks/                           # Network definitions
│   ├── MLP_CLAP_regressor.py
│   ├── blackbox_TCN.py
│   ├── dit_multitrack.py
│   └── transformer.py
│
├── utils/                              # Utility functions and feature extractors
│   ├── MSS_loss.py
│   ├── common_audioeffects.py
│   ├── fxencoder_plusplus/
│   ├── laion_clap/
│   ├── training_utils.py
│   └── feature_extractors/
│
├── examples/                           # Contains subdirectories of dry track set examples to run inference on.
├── conf/                               # Hydra configuration files
├── checkpoints/                        # Path where pretrained model checkpoints are expected to be 
├── run_inference.sh                    # Script for running a single song inference, using a directory containing dry tracks
├── requirements.txt                    # Dependencies
└── README.md

⚙️ Installation

Clone the repository

git clone https://github.com/<your-username>/MEGAMI.git
cd MEGAMI

Create and activate a Conda environment

conda create -n automix python=3.13
conda activate automix

Install dependencies
```
pip install -r requirements.txt
```

🔧 Configuration System (Hydra)

The codebase uses Hydra for modular configuration. Each training or inference script loads a YAML config from conf/ and allows runtime overrides.

Example public config (simplified): conf/conf_FxGenerator_Public.yaml

defaults:
  - dset: MoisesDB_MedleyDB_FxGenerator
  - tester: evaluate_FxGenerator
  - logging: base_logging_FxGenerator

model_dir: "experiments/example"

exp:
  exp_name: "example"
  optimizer:
    _target_: "torch.optim.AdamW"
    lr: 1e-4
  batch_size: 8
diff_params:
  type: "ve_karras"
  content_encoder_type: "CLAP"
  style_encoder_type: "FxEncoder++_DynamicFeatures"
  CLAP_args:
    ckpt_path: "checkpoints/music_audioset_epoch_15_esc_90.14.patched.pt"

To override parameters at runtime:

python train_FxGenerator.py model_dir=experiments/test_run exp.optimizer.lr=5e-5 exp.batch_size=16

🧭 Logging (Weights & Biases)

Logging is handled through Weights & Biases (wandb). By default, if logging.log=True in your config, a new run is created automatically and all training metrics and configurations are logged.

To disable wandb:

python train_FxGenerator.py logging.log=false

You can also change the project or entity directly in the config:

logging:
  log: true
  wandb:
    project: "MEGAMI"
    entity: "your_wandb_username"

🚀 Usage

Training

Example using the provided public scripts:

bash train_FxGenerator_public.sh
bash train_FxProcessor_public.sh
bash train_CLAPDomainAdaptor_public.sh

These scripts automatically create experiment directories under experiments/ and call:

python train_FxGenerator.py --config-name=conf_FxGenerator_Public.yaml

Logs and checkpoints are saved under experiments/<exp_name>/ unless otherwise specified.

Inference

To run inference on a single track, check the script:

bash run_inference.sh

The script requires creating a directory containing a set of dry tracks in .wav format, sampled at 44.1 kHz. Examples are provided in "examples/".

📊 Checkpoints

To reproduce the results reported in the MEGAMI paper, the following pretrained checkpoints are required. All files should be placed in the directory:

checkpoints/

File name	Description
`CLAP_DA_public-100000.pt`	Public CLAP-based domain adaptation checkpoint for effects removal.
`FxGenerator_public-50000.pt`	Public FxGenerator diffusion checkpoint operating in the embedding space.
`FxProcessor_public_blackbox_TCN_340000.pt`	Public FxProcessor checkpoint (black-box TCN model).
`music_audioset_epoch_15_esc_90.14.patched.pt`	LAION-CLAP (music) public checkpoint — Original link.
`fxenc_plusplus_default.pt`	FXEncoder++ public checkpoint — Original link.

🧾 Citation

If you use this framework in your research, please cite:

@article{moliner2025megami,
  title={Automatic Music Mixing Using a Generative Model of Effect Embeddings},
  author={Moliner, Eloi and Martínez-Ramírez, Marco A. and Koo, Junghyun and Liao, Wei-Hsiang and  Cheuk, Kin Wai and Serrà, Joan  and Välimäki, Vesa  and Mitsufuji, Yuki  },
  journal={Preprint},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MEGAMI: Multitrack Embedding Generative Auto Mixing

Automatic Music Mixing Using a Generative Model of Effect Embeddings

Overview

🧩 Key Components

🧱 Repository Structure

⚙️ Installation

🔧 Configuration System (Hydra)

🧭 Logging (Weights & Biases)

🚀 Usage

Training

Inference

📊 Checkpoints

🧾 Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
conf		conf
datasets		datasets
diff_params		diff_params
examples		examples
inference		inference
networks		networks
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_inference.sh		run_inference.sh
train_CLAPDomainAdaptor.py		train_CLAPDomainAdaptor.py
train_CLAPDomainAdaptor_public.sh		train_CLAPDomainAdaptor_public.sh
train_FxGenerator.py		train_FxGenerator.py
train_FxGenerator_public.sh		train_FxGenerator_public.sh
train_FxProcessor.py		train_FxProcessor.py
train_FxProcessor_public.sh		train_FxProcessor_public.sh

SonyResearch/MEGAMI

Folders and files

Latest commit

History

Repository files navigation

MEGAMI: Multitrack Embedding Generative Auto Mixing

Automatic Music Mixing Using a Generative Model of Effect Embeddings

Overview

🧩 Key Components

🧱 Repository Structure

⚙️ Installation

🔧 Configuration System (Hydra)

🧭 Logging (Weights & Biases)

🚀 Usage

Training

Inference

📊 Checkpoints

🧾 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages