Skip to content

Creating and Training on Custom Dataset

Ashwin A Nayar edited this page May 11, 2023 · 3 revisions

The project has sufficient tools to record, prepare, train and test on custom datasets and CSI recordings. However these are, at the time of writing, limited to captures from ESP32 modules, compliant with the offical SDK output formats.

The following requirements are officially (Espressif) encouraged,

  • Use ESP32-C3 / ESP32-S3: ESP32-C3 / ESP32-S3 is the best RF chip at present
  • Use an external antenna: PCB antenna has poor directivity and is easily interfered by the motherboard
  • The distance between the two devices is more than one meter

Even though these are the official recommendations, we were able to pull this off using an ESP32-WROOM-32 + on-board antenna :)

Configuring ESP32 modules

Flash two ESP32s, one with csi_send and the other with csi_recv

# csi_send
cd csi_send
idf.py set-target esp32
idf.py flash -b 921600 -p /dev/ttyUSB0 monitor

# csi_recv
cd csi_recv
idf.py set-target esp32
idf.py flash -b 921600 -p /dev/ttyUSB1

The CSI receiver is connected to the system acquiring data and the sender is provided with a power source.

Capturing raw CSI data

We used picocom to read and log data from serial port.

sudo apt install picocom

Once connected, start logging raw CSI data using the logserial.sh script.

./tools/logserial.sh -d /dev/ttyUSB0 -b 921600 -l activity-name.csi

When done, stop logging with Ctrl + A followed by Ctrl + X.

You might want to manually edit the newly created CSI log file and delete a couple of first and last lines that might have incomplete CSI records. A valid CSI record starts with CSI_DATA and ends with a ]" denoting the end of the CSI data array. The data is in CSV format.

Preparing the dataset

For convenience, the raw data is first processed and a MATLAB style .mat file in generated, ready to be used for training. This is accomplished using the genmat.py script.

Create a recipe.yaml file in the following format,

data_dir: ... # directory where raw CSI files are stored
dest_dir: ... # generated targets will be saved here

targets:
    name_of_dataset_1.mat:
        max_samples_per_class: 100  # -1 to use all the available data
        winsize: 256  # chunk size for each sample
        classes:
            class_name_1:
                - [source_file_1.csi, 5, 3]  # use `source_file_1.csi` but discard data from first 5 seconds and last 3 seconds
                - [source_file_2.csi, 6, 8]  # use `source_file_2.csi` but discard data from first 6 seconds and last 8 seconds
                - ...
            class_name_2:
                - ...
    
    name_of_dataset_2.mat:
        max_samples_per_class: 100  # -1 to use all the available data
        winsize: 256  # chunk size for each sample
        classes:
            class_name_1:
                - [source_file_3.csi, 3, 3]
                - [source_file_4.csi, 5, 5]
                - ...
            class_name_2:
                - ...

Once done save the file and generate the datasets.

./scripts/genmat.py --recipe recipe.yaml

It is possible to not generate the datasets but just summarise how the final data will look like with a dry run.

./scripts/genmat.py --recipe recipe.yaml --dry-run

The .mat files could now be used to train and test the HAR pipeline.

Clone this wiki locally