You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains Keras and Tensorflow based implementation of the speech-driven gesture generation by a neural network.
7
-
8
-
The [project website](https://svito-zar.github.io/audio2gestures/) contains all the information about this project, including [video](https://youtu.be/Iv7UBe92zrw) explanation of the method and the [paper](https://www.researchgate.net/publication/331645229_Analyzing_Input_and_Output_Representations_for_Speech-Driven_Gesture_Generation).
9
-
10
-
## Demo on another dataset
11
-
12
-
This model has been applied to English dataset.
13
-
14
-
The [demo video](https://youtu.be/tQLVyTVtsSU) as well as the [code](https://github.com/Svito-zar/speech-driven-hand-gesture-generation-demo) to run the pre-trained model are online.
6
+
This branch contains the implementation of the IVA '19 paper "Analyzing Input and Output Representations for Speech-Driven Gesture Generation" for [GENEA Challenge 2020](https://genea-workshop.github.io/2020/#gesture-generation-challenge).
We write all the parameters which needs to be specified by a user in the capslock.
58
36
59
-
## 1. Download raw data
37
+
## 1. Obtain raw data
60
38
61
39
- Clone this repository
62
-
- Download a dataset from `https://www.dropbox.com/sh/j419kp4m8hkt9nd/AAC_pIcS1b_WFBqUp5ofBG1Ia?dl=0`
63
-
- Create a directory named `dataset` and put two directories `motion/` and `speech/` under `dataset/`
64
-
65
-
## 2. Split dataset
40
+
- Download a dataset from KTH Box using the link you obtained after singing the license agreement
66
41
67
-
- Put the folder with the dataset in the `data_processing` directory of this repo: next to the script `prepare_data.py`
68
-
- Run the following command
69
42
70
-
```sh
71
-
python data_processing/prepare_data.py DATA_DIR
72
-
# DATA_DIR = directory to save data such as 'data/'
43
+
## 2. Pre-process the data
73
44
```
74
-
75
-
Note: DATA_DIR is not a directory where the raw data is stored (the folder with data, "dataset" , has to be stored in the root folder of this repo). DATA_DIR is the directory where the postprocessed data should be saved. After this step you don't need to have "dataset" in the root folder any more.
76
-
You should use the same DATA_DIR in all the following scripts.
77
-
78
-
After this command:
79
-
-`train/``test/``dev/` are created under `DATA_DIR/`
80
-
- in `inputs/` inside each directory, audio(id).wav files are stored
81
-
- in `labels/` inside each directory, gesture(id).bvh files are stored
82
-
- under `DATA_DIR/`, three csv files `gg-train.csv``gg-test.csv``gg-dev.csv` are created and these files have paths to actual data
# N_CONTEXT = number of context, in our experiments was set to '60'
90
-
# (this means 30 steps backwards and forwards)
45
+
cd data_processing
46
+
python split_dataset.py
47
+
python process_dataset.py
48
+
cd ..
91
49
```
92
50
93
-
Note: if you change the N_CONTEXT value - you need to update it in the `train.py` script.
94
-
95
-
(You are likely to get a warning like this "WARNING:root:frame length (5513) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid." )
51
+
By default, the model expects the dataset in the `<repository>/dataset/raw` folder, and the processed dataset will be available in the `<repository>/dataset/processed folder`. If your dataset is elsewhere, please provide the correct paths with the `--raw_data_dir` and `--proc_data_dir` command line arguments. You can also use '--help' argument to see more details about the scripts.
96
52
97
53
As a result of running this script
98
-
- numpy binary files `X_train.npy`, `Y_train.npy` (vectord dataset) are created under `DATA_DIR`
99
-
- under `DATA_DIR/test_inputs/` , test audios, such as `X_test_audio1168.npy` , are created
100
-
- when N_CONTEXT = 60, the audio vector's shape is (num of timesteps, 61, 26)
101
-
- gesture vector's shape is(num of timesteps, 384)
- numpy binary files `X_train.npy`, `Y_train.npy` (training dataset files) are created under `--proc_data_dir`
55
+
- under `/test_inputs/` subfolder of the processed dataset folder test audios, such as `X_test_audio1168.npy` , are created
103
56
104
-
## If you don't want to customize anything - you can skip reading about steps 4-7 and just use already prepared scripts at the folder `example_scripts`
105
-
106
57
107
-
## 4. Learn motion representation by AutoEncoder and Encode the datset
58
+
## 3. Learn motion representation by AutoEncoder and Encode the datset
108
59
109
60
Create a directory to save training checkpoints such as `chkpt/` and use it as CHKPT_DIR parameter.
110
61
#### Learn dataset encoding and encode the training and validation datasets
0 commit comments