In this project you can evaluate the MNIST database or your hand-written digits (using the included jupyter notebook) on the STM32F746. This example is tested on the STM32F7 discovery kit. If you have another one then you need to do the appropriate changes that are needed.
The base project is derived from my CMAKE template for the STM32F7xx here.
Note: This project derived from this blog post here, which is an update of this post here. The whole series starts from here
This repo has the following tags:
v1.14.0
: using the tensorflow lite for microcontrollers v1.14.0v2.1.0
: using the tensorflow lite for microcontrollers v2.1.0
First you need to build and upload the project on the stm32f7. To do that follow the instructions in the build section. After that you can use the jupyter notebook to hand-draw a digit and then upload the digit on the stm32f7 and get the prediction back. Please follow the guide inside the notebook.
In order to run the notebook, you need python3, tensorflow and PySerial. I've used Ubuntu 18.04 and miniconda, but conda is not really needed. In any case it's good to run the following commads on a virtual environment.
Example for conda
conda create -n stm32f7-nn-env python
conda activate stm32f7-nn-env
conda install -c conda-forge numpy
conda install -c conda-forge jupyter
conda install -c conda-forge tensorflow-gpu
jupyter notebook
And then browse to the jupyter_notebook/MNIST-TensorFlow.ipynb
and run/use the notebook.
To select the which libraries you want to use you need to provide
cmake with the proper options. By default all the options are set
to OFF
. The supported options are:
USE_CORTEX_NN
: If set toON
then the project will build using the DSP/NN libsUSE_HAL_DRIVER
: If set toON
enables the HAL Driver libraryUSE_FREERTOS
: If set toON
enables FreeRTOS
If you don't use the docker-build.sh
script to build the code then you also need to
provide the path of the toolchain to use in the CMAKE_TOOLCHAIN
.
You can build 2 different versions of this code. The one is use the default depthwise_conv
function and the other is to build the cmsis-nn
version. You can select which
version to build using the USE_CORTEX_NN
cmake option.
To build the binary using the CMSIS-NN and CMSIS-DSP, then you need to run the following command:
CLEANBUILD=true USE_CORTEX_NN=ON SRC=src ./build.sh
Finally, I've added two models, the one is the default model without optimized
weights which is located in source/src/inc/model_data_uncompressed.h
and it's
2.1MB! The other model is the one with the compressed weights and it's located
in source/src/inc/model_data_compressed.h
and it's ~614KB. You can select which
model to use while building the code with the USE_COMP_MODEL
cmake flag like this:
CLEANBUILD=true USE_OVERCLOCK=OFF USE_CMSIS_NN=OFF USE_COMP_MODEL=ON SRC=src ./build.sh
The default option is set to OFF
.
Warning: for some reason the compressed model doesn't work properly and the MCU hangs.
Note:
CLEANBUILD=true
is only needed if you need to make a clean build otherwise you can skip it. When it's used then depending on your machine it will take quite some time as I'm building all the DSP and NN libs files.
If you want to have the same build environment like the one I've used, then you can use my CDE image for stm32 and docker like this:
./docker-build.sh "CLEANBUILD=true USE_OVERCLOCK=OFF USE_CMSIS_NN=OFF USE_COMP_MODEL=ON SRC=src ./build.sh"
I've added an overclocking flag that overclocks the CPU @ 280. That's maybe
too high for every available CPU, but also yours can be clocked even higher. To
control the overclocking amount then in the source/src/main.cpp
you'll find these
lines here:
#ifdef OVERCLOCK
RCC_OscInitStruct.PLL.PLLN = 288; // Overclock
#endif
You can change that number to the frequency you like. Then you need to build with the `USE_OVERCLOCK" flag, like this:
CLEANBUILD=true USE_OVERCLOCK=ON USE_HAL_DRIVER=ON USE_CMSIS_NN=ON ./build.sh
Warning: Any overclocking may be the source of unknown issues you may have. In my case I was able to OC up to 285MHz, but sometimes the flatbuffers API was failing at that high frequency! Especially avoid developing with OC.
Usually is more convenient to create your project with CubeMX,
then after you setup all the hardware and peripherals you can create
the code (I prefer SW4STM32
, but it doens't really matter in this case).
Then after the code is exported then you just need to copy the files
that CubeMX customizes for your setup.
The files that usually you need to get and place them in your
source/src
folder are:
- main.h
- main.c
- stm32f7xx_hal_conf.h
- stm32f7xx_hal_msp.c
- stm32f7xx_it.h
- stm32f7xx_it.c
- system_stm32f7xx.c (in case you have custom clocks)
In your case there might be more files. Usually are the files
that are in the exported Inc
and Src
folder.
The code uses 2 serial ports UART6 and UART7. UART6 is the debug port that you can use to run commands from the terminal. UART7 is used from the jupyter notebook for transfering serialized data with flatbuffers.
For the STM32F7-disco board this is the UART6 and UART7 pinout
UART | Tx | Rx |
---|---|---|
UART6 | D0 | D1 |
UART7 | A5 | A4 |
Because this repo has dependencies on other submodules, in order to fetch the repo use the following command:
git clone --recursive -j8 git@bitbucket.org:dimtass/stm32f746-tflite-micro-mnist.git
# or for http
git clone --recursive -j8 https://dimtass@bitbucket.org/dimtass/stm32f746-tflite-micro-mnist.git
To flash the firmware in Linux you need the texane/stlink tool. Then you can use the flash script like this:
./flash.sh
Otherwise you can build the firmware and then use any programmer you like.
The elf, hex and bin firmwares are located in the build-stm32
folder
./build-stm32/*/stm32f7-mnist-tflite.bin
./build-stm32/*/stm32f7-mnist-tflite.hex
./build-stm32/*/stm32f7-mnist-tflite.elf
To flash the HEX file in windows use st-link utility like this:
"C:\Program Files (x86)\STMicroelectronics\STM32 ST-LINK Utility\ST-LINK Utility\ST-LINK_CLI.exe" -c SWD -p build-stm32\src_\stm32f7-mnist-tflite.hex -Rst
To flash the bin in Linux:
st-flash --reset write build-stm32/src/stm32f7-mnist-tflite.bin 0x8000000
You might need to use Google's flatbuffers in case you want to experiment with the serial commands from the python notebook to the stm32f7. These are the commands if you want to build flatbuffers from source and install them (I've used Ubuntu 18.04).
git clone https://github.com/google/flatbuffers.git
cd flatbuffers
cmake -G "Unix Makefiles"
make -j8
sudo make install
The schema
file is located in source/schema
. To build it then run:
source/schema/create-header.sh
The Python serial port client is in the jupyter_notebook/STM32F7Comm
folder.
In order to build the schema for Python, run:
flatc --python -o jupyter_notebook/ ./source/schema/schema.fbs
CMSIS version
: 5.0.4CMSIS-NN version
: V.1.0.0CMSIS-DSP version
: V1.7.0HAL Driver Library version
: 1.2.6
The license is MIT and you can use the code however you like.
Dimitris Tassopoulos dimtass@gmail.com