forked from salu133445/musegan
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8833275
commit dbe4747
Showing
65 changed files
with
302 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# MuseGAN | ||
|
||
<font color=red><strong><i>Warning: this version is no longer maintained</i></strong></font> | ||
|
||
[MuseGAN](https://salu133445.github.io/musegan/) is a project on music | ||
generation. In essence, we aim to generate polyphonic music of multiple tracks | ||
(instruments) with harmonic and rhythmic structure, multi-track interdependency | ||
and temporal structure. To our knowledge, our work represents the first approach | ||
that deal with these issues altogether. | ||
|
||
The models are trained with | ||
[Lakh Pianoroll Dataset](https://salu133445.github.io/lakh-pianoroll-dataset/) | ||
(LPD), a new [multi-track piano-roll](https://salu133445.github.io/musegan/data) | ||
dataset, in an unsupervised approach. The proposed models are able to generate | ||
music either from scratch, or by accompanying a track given by user. | ||
Specifically, we use the model to generate pop song phrases consisting of bass, | ||
drums, guitar, piano and strings tracks. | ||
|
||
Sample results are available [here](https://salu133445.github.io/musegan/results). | ||
|
||
## Papers | ||
|
||
Hao-Wen Dong\*, Wen-Yi Hsiao\*, Li-Chia Yang and Yi-Hsuan Yang, | ||
"**MuseGAN: Multi-track Sequential Generative Adversarial Networks for | ||
Symbolic Music Generation and Accompaniment**," | ||
in *AAAI Conference on Artificial Intelligence* (AAAI), 2018. | ||
[[arxiv](http://arxiv.org/abs/1709.06298)] | ||
[[slides](https://salu133445.github.io/musegan/pdf/musegan-aaai2018-slides.pdf)] | ||
|
||
Hao-Wen Dong\*, Wen-Yi Hsiao\*, Li-Chia Yang and Yi-Hsuan Yang, | ||
"**MuseGAN: Demonstration of a Convolutional GAN Based Model for Generating | ||
Multi-track Piano-rolls**," | ||
in *ISMIR Late-Breaking and Demo Session*, 2017. | ||
(non-peer reviewed two-page extended abstract) | ||
[[paper](https://salu133445.github.io/musegan/pdf/musegan-ismir2017-lbd-paper.pdf)] | ||
[[poster](https://salu133445.github.io/musegan/pdf/musegan-ismir2017-lbd-poster.pdf)] | ||
|
||
\* *These authors contributed equally to this work.* | ||
|
||
## Usage | ||
|
||
```python | ||
import tensorflow as tf | ||
from musegan.core import MuseGAN | ||
from musegan.components import NowbarHybrid | ||
from config import * | ||
|
||
# Initialize a tensorflow session | ||
with tf.Session() as sess: | ||
|
||
# === Prerequisites === | ||
# Step 1 - Initialize the training configuration | ||
t_config = TrainingConfig | ||
|
||
# Step 2 - Select the desired model | ||
model = NowbarHybrid(NowBarHybridConfig) | ||
|
||
# Step 3 - Initialize the input data object | ||
input_data = InputDataNowBarHybrid(model) | ||
|
||
# Step 4 - Load training data | ||
path_train = 'train.npy' | ||
input_data.add_data(path_train, key='train') | ||
|
||
# Step 5 - Initialize a museGAN object | ||
musegan = MuseGAN(sess, t_config, model) | ||
|
||
# === Training === | ||
musegan.train(input_data) | ||
|
||
# === Load a Pretrained Model === | ||
musegan.load(musegan.dir_ckpt) | ||
|
||
# === Generate Samples === | ||
path_test = 'train.npy' | ||
input_data.add_data(path_test, key='test') | ||
musegan.gen_test(input_data, is_eval=True) | ||
``` | ||
|
||
## Training Data | ||
|
||
- [tra_phr.npy](https://drive.google.com/uc?id=1-bQCO6ZxpIgdMM7zXhNJViovHjtBKXde&export=download) | ||
(7.54 GB) contains 50,266 four-bar phrases. The shape is (50266, 384, 84, 5). | ||
- [tra_bar.npy](https://drive.google.com/uc?id=1Xxj6WU82fcgY9UtBpXJGOspoUkMu58xC&export=download) | ||
(4.79 GB) contains 127,734 bars. The shape is (127734, 96, 84, 5). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,170 @@ | ||
''' | ||
Model Configuration | ||
''' | ||
from __future__ import absolute_import | ||
from __future__ import division | ||
from __future__ import print_function | ||
|
||
import numpy as np | ||
from shutil import copyfile | ||
import os | ||
import SharedArray as sa | ||
import tensorflow as tf | ||
import glob | ||
|
||
print('[*] config...') | ||
|
||
# class Dataset: | ||
TRACK_NAMES = ['bass', 'drums', 'guitar', 'piano', 'strings'] | ||
|
||
def get_colormap(): | ||
colormap = np.array([[1., 0., 0.], | ||
[0., 1., 0.], | ||
[0., 0., 1.], | ||
[1., .5, 0.], | ||
[0., .5, 1.]]) | ||
return tf.constant(colormap, dtype=tf.float32, name='colormap') | ||
|
||
########################################################################### | ||
# Training | ||
########################################################################### | ||
|
||
class TrainingConfig: | ||
is_eval = True | ||
batch_size = 64 | ||
epoch = 20 | ||
iter_to_save = 100 | ||
sample_size = 64 | ||
print_batch = True | ||
drum_filter = np.tile([1,0.3,0,0,0,0.3], 16) | ||
scale_mask = [1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 1.] | ||
inter_pair = [(0,2), (0,3), (0,4), (2,3), (2,4), (3,4)] | ||
track_names = TRACK_NAMES | ||
track_dim = len(track_names) | ||
eval_map = np.array([ | ||
[1, 1, 1, 1, 1], # metric_is_empty_bar | ||
[1, 1, 1, 1, 1], # metric_num_pitch_used | ||
[1, 0, 1, 1, 1], # metric_too_short_note_ratio | ||
[1, 0, 1, 1, 1], # metric_polyphonic_ratio | ||
[1, 0, 1, 1, 1], # metric_in_scale | ||
[0, 1, 0, 0, 0], # metric_drum_pattern | ||
[1, 0, 1, 1, 1] # metric_num_chroma_used | ||
]) | ||
|
||
exp_name = 'exp' | ||
gpu_num = '1' | ||
|
||
|
||
########################################################################### | ||
# Model Config | ||
########################################################################### | ||
|
||
class ModelConfig: | ||
output_w = 96 | ||
output_h = 84 | ||
lamda = 10 | ||
batch_size = 64 | ||
beta1 = 0.5 | ||
beta2 = 0.9 | ||
lr = 2e-4 | ||
is_bn = True | ||
colormap = get_colormap() | ||
|
||
# image | ||
class MNISTConfig(ModelConfig): | ||
output_w = 28 | ||
output_h = 28 | ||
z_dim = 74 | ||
output_dim = 1 | ||
|
||
# RNN | ||
class RNNConfig(ModelConfig): | ||
track_names = ['All'] | ||
track_dim = 1 | ||
output_bar = 4 | ||
z_inter_dim = 128 | ||
output_dim = 5 | ||
acc_idx = None | ||
state_size = 128 | ||
|
||
# onebar | ||
class OneBarHybridConfig(ModelConfig): | ||
track_names = TRACK_NAMES | ||
track_dim = 5 | ||
acc_idx = None | ||
z_inter_dim = 64 | ||
z_intra_dim = 64 | ||
output_dim = 1 | ||
|
||
class OneBarJammingConfig(ModelConfig): | ||
track_names = TRACK_NAMES | ||
track_dim = 5 | ||
acc_idx = None | ||
z_intra_dim = 128 | ||
output_dim = 1 | ||
|
||
class OneBarComposerConfig(ModelConfig): | ||
track_names = ['All'] | ||
track_dim = 1 | ||
acc_idx = None | ||
z_inter_dim = 128 | ||
output_dim = 5 | ||
|
||
# nowbar | ||
|
||
class NowBarHybridConfig(ModelConfig): | ||
track_names = TRACK_NAMES | ||
track_dim = 5 | ||
acc_idx = 4 | ||
z_inter_dim = 64 | ||
z_intra_dim = 64 | ||
output_dim = 1 | ||
|
||
class NowBarJammingConfig(ModelConfig): | ||
track_names = TRACK_NAMES | ||
track_dim = 5 | ||
acc_idx = 4 | ||
z_intra_dim = 128 | ||
output_dim = 1 | ||
|
||
class NowBarComposerConfig(ModelConfig): | ||
track_names = ['All'] | ||
track_dim = 1 | ||
acc_idx = 4 | ||
z_inter_dim = 128 | ||
output_dim = 5 | ||
|
||
# Temporal | ||
class TemporalHybridConfig(ModelConfig): | ||
track_names = TRACK_NAMES | ||
track_dim = 5 | ||
output_bar = 4 | ||
z_inter_dim = 32 | ||
z_intra_dim = 32 | ||
acc_idx = None | ||
output_dim = 1 | ||
|
||
class TemporalJammingConfig(ModelConfig): | ||
track_names = TRACK_NAMES | ||
track_dim = 5 | ||
output_bar = 4 | ||
z_intra_dim = 64 | ||
output_dim = 1 | ||
|
||
class TemporalComposerConfig(ModelConfig): | ||
track_names = ['All'] | ||
track_dim = 1 | ||
output_bar = 4 | ||
z_inter_dim = 64 | ||
acc_idx = None | ||
output_dim = 5 | ||
|
||
class NowBarTemporalHybridConfig(ModelConfig): | ||
track_names = TRACK_NAMES | ||
acc_idx = 4 | ||
track_dim = 5 | ||
output_bar = 4 | ||
z_inter_dim = 32 | ||
z_intra_dim = 32 | ||
acc_idx = 4 | ||
output_dim = 1 |
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
from __future__ import absolute_import | ||
from __future__ import division | ||
from __future__ import print_function | ||
|
||
import os | ||
import scipy.misc | ||
import numpy as np | ||
import tensorflow as tf | ||
from pprint import pprint | ||
import SharedArray as sa | ||
|
||
from musegan.core import * | ||
from musegan.components import * | ||
from input_data import * | ||
from config import * | ||
|
||
#assign GPU | ||
|
||
|
||
if __name__ == '__main__': | ||
|
||
""" Create TensorFlow Session """ | ||
|
||
t_config = TrainingConfig | ||
|
||
os.environ['CUDA_VISIBLE_DEVICES'] = t_config.gpu_num | ||
config = tf.ConfigProto() | ||
config.gpu_options.allow_growth = True | ||
|
||
with tf.Session(config=config) as sess: | ||
|
||
path_x_train_phr = 'tra_X_phrase_all' # (50266, 384, 84, 5) | ||
|
||
# Temporal | ||
# hybrid | ||
t_config.exp_name = 'exps/temporal_hybrid' | ||
model = TemporalHybrid(TemporalHybridConfig) | ||
input_data = InputDataTemporalHybrid(model) | ||
input_data.add_data_sa(path_x_train_phr, 'train') | ||
|
||
musegan = MuseGAN(sess, t_config, model) | ||
musegan.train(input_data) | ||
|
||
musegan.load(musegan.dir_ckpt) | ||
musegan.gen_test(input_data, is_eval=True) | ||
|
||
|
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.