Update README.md

Svito-zar · web-flow · commit edf020892a99 · 2020-06-30T09:36:20.000+02:00
diff --git a/README.md b/README.md
@@ -3,15 +3,7 @@
 
 ![ImageOfIdea](visuals/SpeechReprMotion.png?raw=true "Idea")
 
-This repository contains Keras and Tensorflow based implementation of the speech-driven gesture generation by a neural network. 
-
-The [project website](https://svito-zar.github.io/audio2gestures/) contains all the information about this project, including [video](https://youtu.be/Iv7UBe92zrw) explanation of the method and the [paper](https://www.researchgate.net/publication/331645229_Analyzing_Input_and_Output_Representations_for_Speech-Driven_Gesture_Generation).
-
-## Demo on another dataset
-
-This model has been applied to English dataset. 
-
-The [demo video](https://youtu.be/tQLVyTVtsSU) as well as the [code](https://github.com/Svito-zar/speech-driven-hand-gesture-generation-demo) to run the pre-trained model are online.
+This branch contains the implementation of the IVA '19 paper "Analyzing Input and Output Representations for Speech-Driven Gesture Generation" for [GENEA Challenge 2020](https://genea-workshop.github.io/2020/#gesture-generation-challenge).
 
 ## Requirements
 
@@ -32,20 +24,6 @@ pip install tensorflow==1.15.2
 pip install -r requirements.txt
 ```
 
-### install ffmpeg
-```sh
-# macos
-brew install ffmpeg
-```
-
-```
-# ubuntu
-sudo add-apt-repository ppa:jonathonf/ffmpeg-4
-sudo apt-get update
-sudo apt-get install ffmpeg
-```
-
-
 &nbsp;
 ____________________________________________________________________________________________________________
 &nbsp;
@@ -56,68 +34,41 @@ ________________________________________________________________________________
 
 We write all the parameters which needs to be specified by a user in the capslock.
 
-## 1. Download raw data
+## 1. Obtain raw data
 
 - Clone this repository
-- Download a dataset from `https://www.dropbox.com/sh/j419kp4m8hkt9nd/AAC_pIcS1b_WFBqUp5ofBG1Ia?dl=0`
-- Create a directory named `dataset` and put two directories `motion/` and `speech/` under `dataset/`
-
-## 2. Split dataset
+- Download a dataset from KTH Box using the link you obtained after singing the license agreement
 
-- Put the folder with the dataset in the `data_processing` directory of this repo: next to the script `prepare_data.py`
-- Run the following command
 
-```sh
-python data_processing/prepare_data.py DATA_DIR
-# DATA_DIR = directory to save data such as 'data/'
+## 2. Pre-process the data
 ```
-
-Note: DATA_DIR is not a directory where the raw data is stored (the folder with data, "dataset" , has to be stored in the root folder of this repo). DATA_DIR is the directory where the postprocessed data should be saved. After this step you don't need to have "dataset" in the root folder any more. 
-You should use the same DATA_DIR in all the following scripts.
-
-After this command:
-- `train/` `test/` `dev/` are created under `DATA_DIR/`  
-  - in `inputs/` inside each directory, audio(id).wav files are stored  
-  - in `labels/` inside each directory, gesture(id).bvh files are stored  
-- under `DATA_DIR/`,  three csv files `gg-train.csv` `gg-test.csv` `gg-dev.csv` are created and these files have paths to actual data
-
-
-## 3. Convert the dataset into vectors
-
-```sh
-python data_processing/create_vector.py DATA_DIR N_CONTEXT
-# N_CONTEXT = number of context, in our experiments was set to '60'
-# (this means 30 steps backwards and forwards)
+cd data_processing
+python split_dataset.py
+python process_dataset.py
+cd ..
 ```
 
-Note: if you change the N_CONTEXT value - you need to update it in the `train.py` script.
-
-(You are likely to get a warning like this "WARNING:root:frame length (5513) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid." )
+By default, the model expects the dataset in the `<repository>/dataset/raw` folder, and the processed dataset will be available in the `<repository>/dataset/processed folder`. If your dataset is elsewhere, please provide the correct paths with the `--raw_data_dir` and `--proc_data_dir` command line arguments. You can also use '--help' argument to see more details about the scripts.
 
 As a result of running this script
-- numpy binary files `X_train.npy`, `Y_train.npy` (vectord dataset) are created under `DATA_DIR`
-- under `DATA_DIR/test_inputs/` , test audios, such as `X_test_audio1168.npy` , are created
-- when N_CONTEXT = 60, the audio vector's shape is (num of timesteps, 61, 26) 
-- gesture vector's shape is（num of timesteps, 384)
-  - 384 = 64joints × (x,y,z positions + x,y,z velocities)
+- numpy binary files `X_train.npy`, `Y_train.npy` (training dataset files) are created under `--proc_data_dir`
+- under `/test_inputs/` subfolder of the processed dataset folder test audios, such as `X_test_audio1168.npy` , are created
 
-## If you don't want to customize anything - you can skip reading about steps 4-7 and just use already prepared scripts at the folder `example_scripts`
-&nbsp;
 
-## 4. Learn motion representation by AutoEncoder and Encode the datset
+## 3. Learn motion representation by AutoEncoder and Encode the datset
 
 Create a directory to save training checkpoints such as `chkpt/` and use it as CHKPT_DIR parameter.
 #### Learn dataset encoding and encode the training and validation datasets
 ```sh
 python motion_repr_learning/ae/learn_ae_n_encode_dataset.py DATA_DIR -chkpt_dir=CHKPT_DIR -layer1_width=DIM
 ```
 
-The optimal dimensionality (DIM) in our experiment was 325
+The optimal dimensionality (DIM) in our experiment was 40
 
 More information can be found in the folder `motion_repr_learning` 
 
 
-## 5. Learn speech-driven gesture generation model
+## 4. Learn speech-driven gesture generation model
 
 ```sh
 python train.py MODEL_NAME EPOCHS DATA_DIR N_INPUT ENCODE DIM
@@ -129,7 +80,7 @@ python train.py MODEL_NAME EPOCHS DATA_DIR N_INPUT ENCODE DIM
 # DIM = how many dimension does encoding have (ignored if you don't encode)
 ```
 
-## 6. Predict gesture
+## 5. Predict gesture
 
 ```sh
 python predict.py MODEL_NAME INPUT_SPEECH_FILE OUTPUT_GESTURE_FILE
@@ -146,22 +97,11 @@ python motion_repr_learning/ae/decode.py DATA_DIR ENCODED_PREDICTION_FILE DECODE
 ```
 
 
-Note: This can be used in a for loop over all the test sequences. Examples are provided in the 
-`example_scripts` folder of this directory
-
-```sh
-# The network produces both coordinates and velocity
-# So we need to remove velocities
-python helpers/remove_velocity.py -g PATH_TO_GESTURES
-```
-
-## 7. Quantitative evaluation
+## 6. Quantitative evaluation
 Use scripts in the `evaluation` folder of this directory.
 
-Examples are provided in the `example_scripts` folder of this repository
-
-## 8. Qualitative evaluation
-Use [animation server](https://secret-meadow-14164.herokuapp.com/coordinates.html)
+## 7. Qualitative evaluation
+Use animation server which is provided at the GENEA Challenge Github page to visualize your gestures from BVH format.
 
 &nbsp;