Accepted by AAAI 2023 Summer Symposium, with Best Paper Award.
-
Generated conducting motion according to the given music -- Tchaikovsky Piano Concerto No.1:
Tchaikovsky.Piano.Concerto.No.1.mp4
- Objective: We present Diffusion-Conductor, a novel DDIM-based approach for music-driven conducting motion generation.
- Contributions:
- First work to use diffusion model for music-driven conducting motion generation.
- Modify the supervision signal from
ε
tox0
to achieve the better performances, which will inspire later research on motion generation field.
- Benchmark Performance: Ourperform state-of-the-art methods on all four metrics: MSE, FGD, BC, Diversity.
- 18/07/2023: Our paper won the Best Paper Award for AAAI 2023 Inangural Summer Symposium!
Please refer to install.md for detailed installation.
- The training set:https://pan.baidu.com/s/1Pmtr7V7-9ChJqQp04NOyZg?pwd=3209
- The validation set:https://pan.baidu.com/s/1B5JrZnFCFvI9ABkuJeWoFQ?pwd=3209
- The test set:https://pan.baidu.com/s/18ecHYk9b4YM5YTcBNn37qQ?pwd=3209
You can also access the dataset via Google Drive
There are 3 splits of ConductorMotion100: train, val, and test. They respectively correspond to 3 .rar
files. After extract them to <Your Dataset Dir>
folder, the file structure will be:
tree <Your Dataset Dir>
<Your Dataset Dir>
├───train
│ ├───0
│ │ mel.npy
│ │ motion.npy
| ...
│ └───5268
│ mel.npy
│ motion.npy
├───val
│ ├───0
│ │ mel.npy
│ │ motion.npy
| ...
│ └───290
│ mel.npy
│ motion.npy
└───test
├───0
│ mel.npy
│ motion.npy
...
└───293
mel.npy
motion.npy
Each mel.npy
and motion.npy
are corresponded to 60 seconds of Mel spectrogram and motion data. Their sampling rates are respectively 90 Hz and 30 Hz. The Mel spectrogram has 128 frequency bins, therefore mel.shape = (5400, 128)
. The motion data contains 13 2d keypoints, therefore motion.shape = (1800, 13, 2)
cd Contrastive_Stage
python M2SNet_train.py --dataset_dir <Your Dataset Dir>
cd Diffusion_Stage
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
python3 -u tools/train.py \
--name checkpoint_folder_name \
--batch_size 32 \
--times 25 \
--num_epochs 400 \
--dataset_name ConductorMotion100 \
--data_parallel \
--gpu_id 1 2
cd Diffusion_Stage
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
python -u tools/visualization.py \
--motion_length 6 \
--gpu_id 5 \
--result_path "conduct_example.mp4"
For evaluation and inference, you may download the contrastive stage pretrained model and the diffusion stage pretrained model from GoogleDrive.
We would like to thank to the great projects in VirtualConductor and MotionDiffuse.
-
Zhuoran Zhao and Jinbin Bai* and Delong Chen and Debang Wang and Yubo Pan. Taming Diffusion Models for Music-driven Conducting Motion Generation
@inproceedings{zhao2023taming, title={Taming diffusion models for music-driven conducting motion generation}, author={Zhao, Zhuoran and Bai, Jinbin and Chen, Delong and Wang, Debang and Pan, Yubo}, booktitle={Proceedings of the AAAI Symposium Series}, volume={1}, number={1}, pages={40--44}, year={2023} }