Skip to content

The goal of this project is to create multi-modal implementation of Transformer architecture in Swift.

Notifications You must be signed in to change notification settings

jn-sidao/language2motion

 
 

Repository files navigation

Language2motion

The goal of this project is to create multi-modal implementation of Transformer architecture in Swift. It's a learning exercise for me, so I've taken it slowly, starting from simple image classifier and building it up.

Also it's an attempt to answer the question if Swift for Tensorflow is ready for non-trivial work.

The use-case is based on a paper "Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks" by Matthias Plappert. He created a nice dataset of few thousand motions "The KIT Motion-Language Dataset (paper)", website.

The Motion2Language Transformer which kind-of-works is there, already. I'm working towards completing language2motion solution.

I'm using modified Swift Transformer implementation by Andre Carrera.

The plan

  • something 2 label
    • image 2 label
      • build image2label dataset with images representing motions
      • assign 5 dummy(ish) classes with PCA and k-means on motion annotations
      • classify motion images (+in fastai, +in swift)
    • language 2 label
      • Transformer encoder on annotation + classifier
      • batched prediction
      • Use BERT classifier to assign better labels - didn't work
      • manually assign better labels
    • motion 2 label
      • 1-channel ResNet on motion + classifier
      • ResNet feature extractor + Transformer encoder on motion features + classifier - didn't work
      • Transformer encoder on motion + classifier
  • language 2 language
    • Transformer seq2seq from annotation to label text
    • Transformer seq2seq from annotation to (same) annotation
  • motion 2 language
    • Transformer from motion to annotation
  • language 2 motion
    • Transformer encoder on annotation
    • * Transformer decoder on motion

Dataset files

Motion player

Runtime env

About

The goal of this project is to create multi-modal implementation of Transformer architecture in Swift.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 78.5%
  • Swift 13.5%
  • C 7.1%
  • JavaScript 0.8%
  • Python 0.1%
  • Shell 0.0%