LIMU-BERT, a novel representation learning model that can make use of unlabeled IMU data and extract generalized rather than task-specific features. LIMU-BERT adopts the principle of natural language model BERT to effectively capture temporal relations and feature distributions in IMU sensor data. With the representations learned via LIMU-BERT, task-specific models trained with limited labeled samples can achieve superior performances. The designed models are lightweight and easily deployable on mobile devices.
Please check our paper LIMU-BERT for more details.
This project contains following folders and files.
config
: config json files of models and training hyper-parameters.dataset
: the scripts for preprocessing four open datasets and a config file of key attributes of those datasets.benchmark.py
: run DCNN, DeepSense, and R-GRU.classifier.py
: run LIMU-GRU that inputs representations learned by LIMU-BERT and output labels for target applications.classifier_bert.py
: run LIMU-GRU that inputs raw IMU readings and output labels for target applications.config.py
: some helper functions for loading settings.embedding.py
: generates representation or embeddings for raw IMU readings given a pre-trained LIMU-BERT.models.py
: the implementations of LIMU-BERT, LIMU-GRU, and other baseline models.plot.py
: some helper function for plotting IMU sensor data or learned representations.pretrain.py
: pretrain LIMU-BERT.statistic.py
: some helper functions for evaluation.train.py
: several helper functions for training models.utils.py
: some helper functions for preprocessing data or separating dataset.
This repository has be tested for Python 3.7.7/3.8.5 and Pytorch 1.5.1/1.7.1. To install all dependencies, use the following command:
$ pip install -r requirements.txt
In the dataset
folder, we provide four scripts that preprocess the corresponding datasets. Those datasets are widely adopted in the previous studies:
Each script has a kernel function which transform the raw IMU data and output preprocessed data and label. You can set the sampling rate and window size (sequence length).
- Data: a numpy array with the shape of (N*W*F), N is the number of samples, W is the windows size, and F is the number of features (6 or 9).
- Label: a numpy array with the shape of (N*W*L), N is the number of samples, W is the windows size, and L is the number of label types (e.g., activity and user label). The detailed label information is provied in
data_config.json
.
The two numpy arrays are saved as "data_X_Y.npy" and "label_X_Y.npy" in each dataset folder, where X represents the sampling rate and Y is the window size. For example, all data and label are saved as "data_20_120.npy" and "label_20_120.npy" in our experiments and the data and label arrays of HHAR dataset are saved in the dataset/hhar folder.
In our framework, there are two phases:
- Self-supervised training phase: train LIMU-BERT with unlabeled IMU data.
- Supervised training phase: train LIMU-GRU based on the learned representations.
In implementation, there are three steps to run the codes:
pretrain.py
: pretrain LIMU-BERT.embedding.py
: generates and save representations learned by LIMU-BERT.classifier.py
: load representations and train a task-specific classifier.
For other baseline models, directly run benchmark.py
.
pretrain.py
, embedding.py
, classifier.py
,
benchmark.py
, and classifier_bert.py
share the same usage pattern.
usage: xxx.py [-h] [-g GPU] [-f MODEL_FILE] [-t TRAIN_CFG] [-a MASK_CFG]
[-l LABEL_INDEX] [-s SAVE_MODEL]
model_version {hhar,motion,uci,shoaib} {10_100,20_120}
positional arguments:
model_version Model config, e.g. v1
{hhar,motion,uci,shoaib}
Dataset name
{10_100,20_120} Dataset version
optional arguments:
-h, --help show this help message and exit
-g GPU, --gpu GPU Set specific GPU
-f MODEL_FILE, --model_file MODEL_FILE
Pretrain model file, default: None
-t TRAIN_CFG, --train_cfg TRAIN_CFG
Training config json file path
-a MASK_CFG, --mask_cfg MASK_CFG
Mask strategy json file path, default: config/mask.json
-l LABEL_INDEX, --label_index LABEL_INDEX
Label Index setting the task, default: -1
-s SAVE_MODEL, --save_model SAVE_MODEL
The saved model name, default: 'model'
Example:
pretrain.py v1 uci 20_120 -s limu_v1
For this command, we will train a LIMU-BERT, whose settings are defined in the based_v1 of limu_bert.json
,
with the UCI dataset "data_20_120.npy" and "label_20_120.npy". The trained model will be saved as "limu_v1.pt" in the saved/pretrain_base_uci_20_120 folder.
The mask and train settings are defined in the mask.json
and pretrain.json
, respectively.
In the main function of pretrain.py
, you can set following parameters:
- training_rate: float, defines the proportion of unlabeled training data we want to use. The default value is 0.8.
Example:
embedding.py v1 uci 20_120 -f limu_v1
For this command, we will load the pretrained LIMU-BERT from file "limu_v1.pt" in the saved/pretrain_base_uci_20_120 folder. And embedding.py will save the learned representations as "embed_limu_v1_uci_20_120.npy" in the embed folder.
Example:
classifier.py v2 uci 20_120 -f limu_v1 -s limu_gru_v1 -l 0
For this command, we will load the embeddings or representations from "embed_limu_v1_uci_20_120.npy" and train the GRU classifier
, whose settings are defined in the gru_v2 of classifier.json
.
The trained GRU classifier will saved as "limu_gru_v1.pt" in the saved/classifier_base_uci_20_120 folder.
The target task corresponds to the first label in "label_20_120.npy" of UCI dataset, which is a human activity recognition task defined in data_config.json
.
The train settings are defined in the train.json
.
In the main function of classifier.py
, you can set following parameters:
- training_rate: float, defines the proportion of unlabeled data that the pretrained LIMU-BERT uses. The default value is 0.8.
Note that this value must equal to the training_rate in the
pretrain.py
. - label_rate: float, defines the proportion of labeled data to the unlabeled training data that the pretrained LIMU-BERT uses.
- balance: bool, defines whether it should use balanced labeled sample among the multiple classes. Default: True.
- method: str, defines the classifier type from {gru, lstm, cnn1, cnn2, attn}. Default: gru.
If you are confused about the above settings, please check the Section 4.1.1 in our paper for more details.
Example:
classifier_bert.py v1_v2 uci 20_120 -f limu_v1 -s limu_gru_v1 -l 0
For this command, we will train the a composite classifier with pretrained LIMU-BERT and GRU classifier
, whose settings are defined in the gru_v2 of classifier.json
.
The trained LIMU-GRU classifier will saved as "limu_gru_v1.pt" in the saved/bert_classifier_base_uci_20_120 folder.
The train settings are defined in the bert_classifier_train.json
.
Note that "v1_v2" defines two model versions, which corresponds to the LIMU-BERT and GRU classifier, respectively.
Example:
benchmark.py v1 uci 20_120 -s dcnn_v1 -l 0
For this command, we will train a DCNN model, whose settings are defined in the dcnn_v1 of classifier.json
.
The trained DCNN classifier will saved as "dcnn_v1.pt" in the saved/bench_dcnn_uci_20_120 folder.
In the main function of benchmark.py
, the parameters are same to those in classifier.py
.
LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMU Sensing Applications
@inproceedings{xu2021limu,
title={LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMU Sensing Applications},
author={Xu, Huatao and Zhou, Pengfei and Tan, Rui and Li, Mo and Shen, Guobin},
booktitle={Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems},
pages={220--233},
year={2021}
}
huatao001@e.ntu.edu.sg (preferred)