Skip to content

Commit edf0208

Browse files
authored
Update README.md
1 parent c285d88 commit edf0208

File tree

1 file changed

+18
-78
lines changed

1 file changed

+18
-78
lines changed

README.md

Lines changed: 18 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,7 @@
33

44
![ImageOfIdea](visuals/SpeechReprMotion.png?raw=true "Idea")
55

6-
This repository contains Keras and Tensorflow based implementation of the speech-driven gesture generation by a neural network.
7-
8-
The [project website](https://svito-zar.github.io/audio2gestures/) contains all the information about this project, including [video](https://youtu.be/Iv7UBe92zrw) explanation of the method and the [paper](https://www.researchgate.net/publication/331645229_Analyzing_Input_and_Output_Representations_for_Speech-Driven_Gesture_Generation).
9-
10-
## Demo on another dataset
11-
12-
This model has been applied to English dataset.
13-
14-
The [demo video](https://youtu.be/tQLVyTVtsSU) as well as the [code](https://github.com/Svito-zar/speech-driven-hand-gesture-generation-demo) to run the pre-trained model are online.
6+
This branch contains the implementation of the IVA '19 paper "Analyzing Input and Output Representations for Speech-Driven Gesture Generation" for [GENEA Challenge 2020](https://genea-workshop.github.io/2020/#gesture-generation-challenge).
157

168
## Requirements
179

@@ -32,20 +24,6 @@ pip install tensorflow==1.15.2
3224
pip install -r requirements.txt
3325
```
3426

35-
### install ffmpeg
36-
```sh
37-
# macos
38-
brew install ffmpeg
39-
```
40-
41-
```
42-
# ubuntu
43-
sudo add-apt-repository ppa:jonathonf/ffmpeg-4
44-
sudo apt-get update
45-
sudo apt-get install ffmpeg
46-
```
47-
48-
4927
 
5028
____________________________________________________________________________________________________________
5129
 
@@ -56,68 +34,41 @@ ________________________________________________________________________________
5634

5735
We write all the parameters which needs to be specified by a user in the capslock.
5836

59-
## 1. Download raw data
37+
## 1. Obtain raw data
6038

6139
- Clone this repository
62-
- Download a dataset from `https://www.dropbox.com/sh/j419kp4m8hkt9nd/AAC_pIcS1b_WFBqUp5ofBG1Ia?dl=0`
63-
- Create a directory named `dataset` and put two directories `motion/` and `speech/` under `dataset/`
64-
65-
## 2. Split dataset
40+
- Download a dataset from KTH Box using the link you obtained after singing the license agreement
6641

67-
- Put the folder with the dataset in the `data_processing` directory of this repo: next to the script `prepare_data.py`
68-
- Run the following command
6942

70-
```sh
71-
python data_processing/prepare_data.py DATA_DIR
72-
# DATA_DIR = directory to save data such as 'data/'
43+
## 2. Pre-process the data
7344
```
74-
75-
Note: DATA_DIR is not a directory where the raw data is stored (the folder with data, "dataset" , has to be stored in the root folder of this repo). DATA_DIR is the directory where the postprocessed data should be saved. After this step you don't need to have "dataset" in the root folder any more.
76-
You should use the same DATA_DIR in all the following scripts.
77-
78-
After this command:
79-
- `train/` `test/` `dev/` are created under `DATA_DIR/`
80-
- in `inputs/` inside each directory, audio(id).wav files are stored
81-
- in `labels/` inside each directory, gesture(id).bvh files are stored
82-
- under `DATA_DIR/`, three csv files `gg-train.csv` `gg-test.csv` `gg-dev.csv` are created and these files have paths to actual data
83-
84-
85-
## 3. Convert the dataset into vectors
86-
87-
```sh
88-
python data_processing/create_vector.py DATA_DIR N_CONTEXT
89-
# N_CONTEXT = number of context, in our experiments was set to '60'
90-
# (this means 30 steps backwards and forwards)
45+
cd data_processing
46+
python split_dataset.py
47+
python process_dataset.py
48+
cd ..
9149
```
9250

93-
Note: if you change the N_CONTEXT value - you need to update it in the `train.py` script.
94-
95-
(You are likely to get a warning like this "WARNING:root:frame length (5513) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid." )
51+
By default, the model expects the dataset in the `<repository>/dataset/raw` folder, and the processed dataset will be available in the `<repository>/dataset/processed folder`. If your dataset is elsewhere, please provide the correct paths with the `--raw_data_dir` and `--proc_data_dir` command line arguments. You can also use '--help' argument to see more details about the scripts.
9652

9753
As a result of running this script
98-
- numpy binary files `X_train.npy`, `Y_train.npy` (vectord dataset) are created under `DATA_DIR`
99-
- under `DATA_DIR/test_inputs/` , test audios, such as `X_test_audio1168.npy` , are created
100-
- when N_CONTEXT = 60, the audio vector's shape is (num of timesteps, 61, 26)
101-
- gesture vector's shape is(num of timesteps, 384)
102-
- 384 = 64joints × (x,y,z positions + x,y,z velocities)
54+
- numpy binary files `X_train.npy`, `Y_train.npy` (training dataset files) are created under `--proc_data_dir`
55+
- under `/test_inputs/` subfolder of the processed dataset folder test audios, such as `X_test_audio1168.npy` , are created
10356

104-
## If you don't want to customize anything - you can skip reading about steps 4-7 and just use already prepared scripts at the folder `example_scripts`
105-
&nbsp;
10657

107-
## 4. Learn motion representation by AutoEncoder and Encode the datset
58+
## 3. Learn motion representation by AutoEncoder and Encode the datset
10859

10960
Create a directory to save training checkpoints such as `chkpt/` and use it as CHKPT_DIR parameter.
11061
#### Learn dataset encoding and encode the training and validation datasets
11162
```sh
11263
python motion_repr_learning/ae/learn_ae_n_encode_dataset.py DATA_DIR -chkpt_dir=CHKPT_DIR -layer1_width=DIM
11364
```
11465

115-
The optimal dimensionality (DIM) in our experiment was 325
66+
The optimal dimensionality (DIM) in our experiment was 40
11667

11768
More information can be found in the folder `motion_repr_learning`
11869

11970

120-
## 5. Learn speech-driven gesture generation model
71+
## 4. Learn speech-driven gesture generation model
12172

12273
```sh
12374
python train.py MODEL_NAME EPOCHS DATA_DIR N_INPUT ENCODE DIM
@@ -129,7 +80,7 @@ python train.py MODEL_NAME EPOCHS DATA_DIR N_INPUT ENCODE DIM
12980
# DIM = how many dimension does encoding have (ignored if you don't encode)
13081
```
13182

132-
## 6. Predict gesture
83+
## 5. Predict gesture
13384

13485
```sh
13586
python predict.py MODEL_NAME INPUT_SPEECH_FILE OUTPUT_GESTURE_FILE
@@ -146,22 +97,11 @@ python motion_repr_learning/ae/decode.py DATA_DIR ENCODED_PREDICTION_FILE DECODE
14697
```
14798

14899

149-
Note: This can be used in a for loop over all the test sequences. Examples are provided in the
150-
`example_scripts` folder of this directory
151-
152-
```sh
153-
# The network produces both coordinates and velocity
154-
# So we need to remove velocities
155-
python helpers/remove_velocity.py -g PATH_TO_GESTURES
156-
```
157-
158-
## 7. Quantitative evaluation
100+
## 6. Quantitative evaluation
159101
Use scripts in the `evaluation` folder of this directory.
160102

161-
Examples are provided in the `example_scripts` folder of this repository
162-
163-
## 8. Qualitative evaluation
164-
Use [animation server](https://secret-meadow-14164.herokuapp.com/coordinates.html)
103+
## 7. Qualitative evaluation
104+
Use animation server which is provided at the GENEA Challenge Github page to visualize your gestures from BVH format.
165105

166106
&nbsp;
167107

0 commit comments

Comments
 (0)