Name	Name	Last commit message	Last commit date
Latest commit History 73 Commits
data_processing	data_processing
evaluation	evaluation
example_scripts	example_scripts
helpers	helpers
motion_repr_learning	motion_repr_learning
results	results
visuals	visuals
LICENSE	LICENSE
README.md	README.md
config.py	config.py
config.yaml	config.yaml
hierarchy.txt	hierarchy.txt
predict.py	predict.py
requirements.txt	requirements.txt
train.py	train.py

Analyzing Input and Output Representations for Speech-Driven Gesture Generation

Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, Hedvig Kjellström

This branch contains the implementation of the IVA '19 paper Analyzing Input and Output Representations for Speech-Driven Gesture Generation for GENEA Challenge 2020.

Requirements

Python 3

Initial setup

install packages

# if you have GPU
pip install tensorflow-gpu==1.15.2

# if you don't have GPU
pip install tensorflow==1.15.2

pip install -r requirements.txt

How to use this repository?

0. Notation

Whenever a parameter is written in caps (such as DATA_DIR), it has to be specified by the user on the command line as a positional argument.

1. Obtain raw data

Clone this repository

git clone git@github.com:GestureGeneration/Speech_driven_gesture_generation_with_autoencoder.git

Switch branch to 'GENEA_2020'

git checkout GENEA_2020

Download a dataset from KTH Box using the link you obtained after singing the license agreement

2. Pre-process the data

By default, the model expects the dataset in the <repository>/dataset/raw folder, and the processed dataset will be available in the <repository>/dataset/processed folder. If your dataset is elsewhere, please provide the correct paths with the --raw_data_dir and --proc_data_dir command line arguments. You can also use '--help' argument to see more details about the scripts.

cd data_processing

# encode motion from BVH files into exponensial map representation
python bvh2features.py -orig <path/to/motion/folder/ -dest <path/to/motion/folder/

# Split the dataset into training and validation
python split_dataset.py

# Encode all the features
python process_dataset.py

cd ..

As a result of running this script, the dataset is created in --proc_data_dir:

the training dataset files X_train.npy, Y_train.npy and the validation dataset files X_dev.npy, Y_dev.npyare binary numpy files
the audio inputs for testing (such as X_test_NaturalTalking_04.npy) are under the /test_inputs/ subfolder

There rest of the folders in --proc_data_dir (e.g. /dev_inputs/ or /train/) can be ignored (they are a side effect of the preprocessing script).

3. Learn motion representation by AutoEncoder and encode the training and validation datasets

python motion_repr_learning/ae/learn_ae_n_encode_dataset.py --layer1_width DIM

There are several parameters that can be modified in the config.yaml file or through the command line, see config.py for details. The optimal dimensionality (DIM) in our experiment was 40.

More information can be found in the folder motion_repr_learning

4. Learn speech-driven gesture generation model

python train.py MODEL_NAME EPOCHS DATA_DIR N_INPUT ENCODE DIM
# MODEL_NAME = file name for the model
# EPOCHS = how many epochs do we want to train the model (recommended - 500)
# DATA_DIR = directory with the data (should be same as before)
# N_INPUT = how many dimension does speech data have (default - 26)
# ENCODE = True (because we use AutoEncoder)
# DIM = how many dimension does encoding have (should be the same as above, recommended - 40)

5. Predict gesture

python predict.py MODEL_NAME.hdf5 INPUT_SPEECH_FILE OUTPUT_GESTURE_FILE

# Usage example
python predict.py model.hdf5 data/test_inputs/X_test_NaturalTalking_04.npy data/test_inputs/predict_04_20fps.npy

The predicted gestures have to be decoded with decode.py, which reuses the config from step 3.

python motion_repr_learning/ae/decode.py python decode.py -input_file INPUT_FILE -output_file OUTPUT_FILE --layer1_width DIM --batch_size=8

Convert the motion from exponential maps to euler angles and write into BVH file

cd data_processind
python features2bvh.py

6. Qualitative evaluation

Use animation server which is provided at the GENEA Challenge Github page to visualize your gestures from BVH format.

Citation

If you use this code in your research please cite the paper:

@inproceedings{kucherenko2019analyzing,
  title={Analyzing Input and Output Representations for Speech-Driven Gesture Generation},
  author={Kucherenko, Taras and Hasegawa, Dai and Henter, Gustav Eje  and Kaneko, Naoshi and Kjellstr{\"o}m, Hedvig},
  booktitle=={International Conference on Intelligent Virtual Agents (IVA ’19)},
  year={2019},
  publisher = {ACM},
}

Contact

If you encounter any problems/bugs/issues please contact me on Github or by emailing me at tarask@kth.se for any bug reports/questions/suggestions. I prefer questions and bug reports on Github as that provides visibility to others who might be encountering same issues or who have the same questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analyzing Input and Output Representations for Speech-Driven Gesture Generation

Requirements

Initial setup

install packages

How to use this repository?

0. Notation

1. Obtain raw data

2. Pre-process the data

3. Learn motion representation by AutoEncoder and encode the training and validation datasets

4. Learn speech-driven gesture generation model

5. Predict gesture

6. Qualitative evaluation

Citation

Contact

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

License

GestureGeneration/Speech_driven_gesture_generation_with_autoencoder

Folders and files

Latest commit

History

Repository files navigation

Analyzing Input and Output Representations for Speech-Driven Gesture Generation

Requirements

Initial setup

install packages

How to use this repository?

0. Notation

1. Obtain raw data

2. Pre-process the data

3. Learn motion representation by AutoEncoder and encode the training and validation datasets

4. Learn speech-driven gesture generation model

5. Predict gesture

6. Qualitative evaluation

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages