Skip to content

A lite BERT-like representation model for IMU sensor data

License

Notifications You must be signed in to change notification settings

dapowan/LIMU-BERT-Public

Repository files navigation

LIMU-BERT

LIMU-BERT, a novel representation learning model that can make use of unlabeled IMU data and extract generalized rather than task-specific features. LIMU-BERT adopts the principle of natural language model BERT to effectively capture temporal relations and feature distributions in IMU sensor data. With the representations learned via LIMU-BERT, task-specific models trained with limited labeled samples can achieve superior performances. The designed models are lightweight and easily deployable on mobile devices.

Please check our paper LIMU-BERT for more details.

File Overview

This project contains following folders and files.

  • config : config json files of models and training hyper-parameters.
  • dataset : the scripts for preprocessing four open datasets and a config file of key attributes of those datasets.
  • benchmark.py : run DCNN, DeepSense, and R-GRU.
  • classifier.py : run LIMU-GRU that inputs representations learned by LIMU-BERT and output labels for target applications.
  • classifier_bert.py : run LIMU-GRU that inputs raw IMU readings and output labels for target applications.
  • config.py : some helper functions for loading settings.
  • embedding.py : generates representation or embeddings for raw IMU readings given a pre-trained LIMU-BERT.
  • models.py : the implementations of LIMU-BERT, LIMU-GRU, and other baseline models.
  • plot.py : some helper function for plotting IMU sensor data or learned representations.
  • pretrain.py : pretrain LIMU-BERT.
  • statistic.py : some helper functions for evaluation.
  • train.py : several helper functions for training models.
  • utils.py : some helper functions for preprocessing data or separating dataset.

Setup

This repository has be tested for Python 3.7.7/3.8.5 and Pytorch 1.5.1/1.7.1. To install all dependencies, use the following command:

$ pip install -r requirements.txt

Prepare dataset

In the dataset folder, we provide four scripts that preprocess the corresponding datasets. Those datasets are widely adopted in the previous studies:

Each script has a kernel function which transform the raw IMU data and output preprocessed data and label. You can set the sampling rate and window size (sequence length).

  • Data: a numpy array with the shape of (N*W*F), N is the number of samples, W is the windows size, and F is the number of features (6 or 9).
  • Label: a numpy array with the shape of (N*W*L), N is the number of samples, W is the windows size, and L is the number of label types (e.g., activity and user label). The detailed label information is provied in data_config.json.

The two numpy arrays are saved as "data_X_Y.npy" and "label_X_Y.npy" in each dataset folder, where X represents the sampling rate and Y is the window size. For example, all data and label are saved as "data_20_120.npy" and "label_20_120.npy" in our experiments and the data and label arrays of HHAR dataset are saved in the dataset/hhar folder.

Framework

In our framework, there are two phases:

  • Self-supervised training phase: train LIMU-BERT with unlabeled IMU data.
  • Supervised training phase: train LIMU-GRU based on the learned representations.

In implementation, there are three steps to run the codes:

  • pretrain.py : pretrain LIMU-BERT.
  • embedding.py : generates and save representations learned by LIMU-BERT.
  • classifier.py : load representations and train a task-specific classifier.

For other baseline models, directly run benchmark.py.

Usage

pretrain.py, embedding.py, classifier.py, benchmark.py, and classifier_bert.py share the same usage pattern.

usage: xxx.py [-h] [-g GPU] [-f MODEL_FILE] [-t TRAIN_CFG] [-a MASK_CFG]
                   [-l LABEL_INDEX] [-s SAVE_MODEL]
                   model_version {hhar,motion,uci,shoaib} {10_100,20_120}

positional arguments:
  model_version         Model config, e.g. v1
  {hhar,motion,uci,shoaib}
                        Dataset name
  {10_100,20_120}       Dataset version

optional arguments:
  -h, --help            show this help message and exit
  -g GPU, --gpu GPU     Set specific GPU
  -f MODEL_FILE, --model_file MODEL_FILE
                        Pretrain model file, default: None
  -t TRAIN_CFG, --train_cfg TRAIN_CFG
                        Training config json file path
  -a MASK_CFG, --mask_cfg MASK_CFG
                        Mask strategy json file path, default: config/mask.json
  -l LABEL_INDEX, --label_index LABEL_INDEX
                        Label Index setting the task, default: -1
  -s SAVE_MODEL, --save_model SAVE_MODEL
                        The saved model name, default: 'model'

Example:

pretrain.py v1 uci 20_120 -s limu_v1 

For this command, we will train a LIMU-BERT, whose settings are defined in the based_v1 of limu_bert.json, with the UCI dataset "data_20_120.npy" and "label_20_120.npy". The trained model will be saved as "limu_v1.pt" in the saved/pretrain_base_uci_20_120 folder. The mask and train settings are defined in the mask.json and pretrain.json, respectively.

In the main function of pretrain.py, you can set following parameters:

  • training_rate: float, defines the proportion of unlabeled training data we want to use. The default value is 0.8.

Example:

embedding.py v1 uci 20_120 -f limu_v1

For this command, we will load the pretrained LIMU-BERT from file "limu_v1.pt" in the saved/pretrain_base_uci_20_120 folder. And embedding.py will save the learned representations as "embed_limu_v1_uci_20_120.npy" in the embed folder.

Example:

classifier.py v2 uci 20_120 -f limu_v1 -s limu_gru_v1 -l 0

For this command, we will load the embeddings or representations from "embed_limu_v1_uci_20_120.npy" and train the GRU classifier , whose settings are defined in the gru_v2 of classifier.json. The trained GRU classifier will saved as "limu_gru_v1.pt" in the saved/classifier_base_uci_20_120 folder. The target task corresponds to the first label in "label_20_120.npy" of UCI dataset, which is a human activity recognition task defined in data_config.json. The train settings are defined in the train.json.

In the main function of classifier.py, you can set following parameters:

  • training_rate: float, defines the proportion of unlabeled data that the pretrained LIMU-BERT uses. The default value is 0.8. Note that this value must equal to the training_rate in the pretrain.py.
  • label_rate: float, defines the proportion of labeled data to the unlabeled training data that the pretrained LIMU-BERT uses.
  • balance: bool, defines whether it should use balanced labeled sample among the multiple classes. Default: True.
  • method: str, defines the classifier type from {gru, lstm, cnn1, cnn2, attn}. Default: gru.

If you are confused about the above settings, please check the Section 4.1.1 in our paper for more details.

Example:

classifier_bert.py v1_v2 uci 20_120 -f limu_v1 -s limu_gru_v1 -l 0

For this command, we will train the a composite classifier with pretrained LIMU-BERT and GRU classifier , whose settings are defined in the gru_v2 of classifier.json. The trained LIMU-GRU classifier will saved as "limu_gru_v1.pt" in the saved/bert_classifier_base_uci_20_120 folder. The train settings are defined in the bert_classifier_train.json. Note that "v1_v2" defines two model versions, which corresponds to the LIMU-BERT and GRU classifier, respectively.

Example:

benchmark.py v1 uci 20_120 -s dcnn_v1 -l 0

For this command, we will train a DCNN model, whose settings are defined in the dcnn_v1 of classifier.json. The trained DCNN classifier will saved as "dcnn_v1.pt" in the saved/bench_dcnn_uci_20_120 folder.

In the main function of benchmark.py, the parameters are same to those in classifier.py.

Citation

LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMU Sensing Applications

@inproceedings{xu2021limu,
  title={LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMU Sensing Applications},
  author={Xu, Huatao and Zhou, Pengfei and Tan, Rui and Li, Mo and Shen, Guobin},
  booktitle={Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems},
  pages={220--233},
  year={2021}
}

Contact

huatao001@e.ntu.edu.sg (preferred)

735820057@qq.com

About

A lite BERT-like representation model for IMU sensor data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published