-
Notifications
You must be signed in to change notification settings - Fork 7
Creating and Training on Custom Dataset
The project has sufficient tools to record, prepare, train and test on custom datasets and CSI recordings. However these are, at the time of writing, limited to captures from ESP32 modules, compliant with the offical SDK output formats.
The following requirements are officially (Espressif) encouraged,
- Use ESP32-C3 / ESP32-S3: ESP32-C3 / ESP32-S3 is the best RF chip at present
- Use an external antenna: PCB antenna has poor directivity and is easily interfered by the motherboard
- The distance between the two devices is more than one meter
Even though these are the official recommendations, we were able to pull this off using an ESP32-WROOM-32 + on-board antenna :)
Flash two ESP32s, one with csi_send
and the other with csi_recv
# csi_send
cd csi_send
idf.py set-target esp32
idf.py flash -b 921600 -p /dev/ttyUSB0 monitor
# csi_recv
cd csi_recv
idf.py set-target esp32
idf.py flash -b 921600 -p /dev/ttyUSB1
The CSI receiver is connected to the system acquiring data and the sender is provided with a power source.
We used picocom to read and log data from serial port.
sudo apt install picocom
Once connected, start logging raw CSI data using the logserial.sh
script.
./tools/logserial.sh -d /dev/ttyUSB0 -b 921600 -l activity-name.csi
When done, stop logging with Ctrl + A followed by Ctrl + X.
You might want to manually edit the newly created CSI log file and delete a couple of first and last lines that might have incomplete CSI records. A valid CSI record starts with CSI_DATA
and ends with a ]"
denoting the end of the CSI data array. The data is in CSV format.
For convenience, the raw data is first processed and a MATLAB style .mat
file in generated, ready to be used for training. This is accomplished using the genmat.py
script.
Create a recipe.yaml
file in the following format,
data_dir: ... # directory where raw CSI files are stored
dest_dir: ... # generated targets will be saved here
targets:
name_of_dataset_1.mat:
max_samples_per_class: 100 # -1 to use all the available data
winsize: 256 # chunk size for each sample
classes:
class_name_1:
- [source_file_1.csi, 5, 3] # use `source_file_1.csi` but discard data from first 5 seconds and last 3 seconds
- [source_file_2.csi, 6, 8] # use `source_file_2.csi` but discard data from first 6 seconds and last 8 seconds
- ...
class_name_2:
- ...
name_of_dataset_2.mat:
max_samples_per_class: 100 # -1 to use all the available data
winsize: 256 # chunk size for each sample
classes:
class_name_1:
- [source_file_3.csi, 3, 3]
- [source_file_4.csi, 5, 5]
- ...
class_name_2:
- ...
Once done save the file and generate the datasets.
./scripts/genmat.py --recipe recipe.yaml
It is possible to not generate the datasets but just summarise how the final data will look like with a dry run.
./scripts/genmat.py --recipe recipe.yaml --dry-run
The .mat
files could now be used to train and test the HAR pipeline.