Training a Custom YOLO v4 Darknet Model on Azure and Running with Azure Live Video Analytics on IoT Edge
- Train a custom YOLO v4 model
- TensorFlow Lite conversion for fast inferencing
- Azure Live Video Analytics on IoT Edge
- Links/References
- SSH client or command line tool - for Windows try putty.exe
- SCP client or command line tool - for Windows try pscp.exe
- Azure Subscription - a Free Trial available for new customers.
- Familiarity with Unix commands - e.g.
vim,nano,wget,curl, etc. - Visual Object Tagging Tool - VoTT
- Set up an N-series Virtual Machine by using the michhar/darknet-azure-vm-ubuntu-18.04 project VM setup.
- SSH into the Ubuntu DSVM w/ username and password (of if used ssh key, use that)
- If this is a corporate subscription, may need to delete an inbound port rule under “Networking” in the Azure Portal (delete Cleanuptool-Deny-103)
- Test the Darknet executable by running the following.
- Get the YOLO v4 tiny weights
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights- Run a test on a static image from repository. Run the following command and then give the path to a test image (look in the
datafolder for sample images e.g.data/giraffe.jpg). Thecoco.datagives the links to other necessary files. Theyolov4-tiny.cfgspecifies the architecture and settings for tiny YOLO v4.
./darknet detector test ./cfg/coco.data ./cfg/yolov4-tiny.cfg ./yolov4-tiny.weights- Check
predictions.jpgfor the bounding boxes overlaid on the image. You may "shell copy" (SCP) this file down to your machine to view it or alternatively remote desktop into the machine with a program like X2Go.
- Label some test data locally (aim for about 500-1000 bounding boxes drawn, noting that less will result is less accurate results for those classes)
- Label data with VoTT and export as
json - Convert the
jsonfiles to YOLO.txtfiles by running the following script (vott2.0_to_yolo.py). In this script, a change must be made. Update line 13 (LABELS = {'helmet': 0, 'no_helmet': 1}) to reflect your classes. Running this script should result in one.txtfile per.jsonVoTT annotation file. The.txtfiles are the YOLO format thatdarknetcan use. Run this conversion script as follows, for example.
python vott2.0_to_yolo.py --annot-folder path_to_folder_with_json_files --out-folder new_folder_for_txt_annotations- Darknet will need a specific folder structure. Structure the data folder as follows where in the
data/imgfolder the image is placed along with the.txtannotation file.
data/ img/ image1.jpg image1.txt image2.jpg image2.txt ... train.txt valid.txt obj.data obj.namesobj.datais a general file to directdarknetto the other data-related files and model folder. It looks simliar to the following with necessary changes toclassesfor your scenario.
classes = 2 train = build/darknet/x64/data/train.txt valid = build/darknet/x64/data/valid.txt names = build/darknet/x64/data/obj.names backup = backup/obj.namescontains the class names, one per line.train.txtandvalid.txtshould look as follows, for example. Note,train.txtis the training images and is a different subset from the smaller list found invalid.txt. As a general rule, 5-10% of the image paths should be placed invalid.txt. These should be randomly distributed.
build/darknet/x64/data/img/image1.jpg build/darknet/x64/data/img/image5.jpg ...- These instructions may also be found in How to train on your own data.
- Label data with VoTT and export as
- Upload data to the DSVM as follows.
- Zip the
datafolder (zip -r data.zip dataif using the command line) and copy (scp data.zip <username>@<public IP or DNS name>:~/darknet/build/darknet/x64/- usepscp.exeon Windows) the data up to VM (may need to delete networking rule Cleanuptool-Deny-103 again if this gives a timeout error). Note thedata.zipis placed in thedarknet/build/darknet/x64folder. This is wheredarknetwill look for the data. - Log in to the DSVM with SSH
- On the DSVM, unzip the compressed
data.zipfound, now, in the folderdarknet/build/darknet/x64.
- Zip the
- Read through How to train on your own data from the Darknet repo, mainly on updating the
.cfgfile. We will be using the tiny archicture of YOLO v4 so will calculate anchors and update the config accordingly (thecfg/yolov4-tiny-custom.cfg). The following summarizes the changes for reference, but please refer to the Darknet repo for more information/clarification.- Calculate anchor boxes (especially important if you have very big or very small objects on average). We use
-num_of_clusters 6because of the tiny architecture configuration. IMPORTANT: make note of these anchors (darknet creates a file for you calledanchors.txt) for the section on converting the model to TFLite so you will need them later on../darknet detector calc_anchors build/darknet/x64/data/obj.data -num_of_clusters 6 -width 416 -height 416` - Configure the cfg file (you will see a file called
cfg/yolov4-tiny-custom.cfg). Open the file with an editor likevimornano. Modify the following to your scenario. For example, this header (netblock):[net] # Testing #batch=1 #subdivisions=1 # Training batch=16 subdivisions=2 ... learning_rate=0.00261 burn_in=1000 max_batches = 4000 policy=steps steps=3200,3600 ...- Info for the
yoloblocks (in each YOLO block or just before - there are two blocks in the tiny architecture):- Class number – change to your number of classes (each YOLO block)
- Filters – (5 + num_classes)*3 (neural net layer before each YOLO block)
- Anchors – these are also known as anchor boxes (each YOLO block) - use the calculated anchors from the previous step.
- Info for the
- Calculate anchor boxes (especially important if you have very big or very small objects on average). We use
- Train the model with the following two commands.
- This will download the base model weights:
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29- This will run the training experiment (where
-clearmeans it will start training from the base model just downloaded rather than already present weights in thebackupfolder; thebackupfolder is where the weights will show up after training).
./darknet detector train build/darknet/x64/data/obj.data cfg/yolov4-tiny-custom.cfg yolov4-tiny.conv.29 -map -dont_show -clear
-
If using the michhar/darknet-azure-vm-ubuntu-18.04 GitHub VM setup as instructed above, the hunglc007/tensorflow-yolov4-tflite project will have already been cloned and the correct Python environment set up with TensorFlow 2.
-
You can use an editor like VSCode or any other text editor will work for the following.
- Change
coco.namestoobj.namesincore/config.py - Update the anchors on line 17 of
core/config.pyto match the anchor sizes used to train the model, e.g.:
__C.YOLO.ANCHORS_TINY = [ 81, 27, 28, 80, 58, 51, 76,100, 109, 83, 95,246]- Place
obj.namesfile from your Darknet project in thedata/classesfolder.
- Change
-
Convert from Darknet to TensorFlow Lite (with quantization) with the two steps as follows. Use the weights from your Darknet experiment (as found in the
~/darknet/backup/folder).- In the
tensorflow-yolov4-tflitefolder activate the Python environment with:source env/bin/activate - Save the model to TensorFlow protobuf intermediate format.
python save_model.py --weights yolov4-tiny-custom_best.weights --output ./checkpoints/yolov4-tiny-416-tflite2 --input_size 416 --model yolov4 --framework tflite --tiny- Convert the protobuf model weights to TFLite format with quantization.
python convert_tflite.py --weights ./checkpoints/yolov4-tiny-416-tflite2 --output ./checkpoints/yolov4-tiny-416-fp16.tflite --quantize_mode float16 - In the
-
[Optional] Run the video test on remote desktop (recommended to use X2Go client for Windows or Mac with your VM user and IP address) to check that everything is ok. Once X2Go has connected and you have a remote desktop instance running, open a terminal window ("terminal emulator" program).
- Navigate to the project folder.
cd tensorflow-yolov4-tflite- Run the video demo.
python detectvideo.py --framework tflite --weights ./checkpoints/yolov4-tiny-416-fp16.tflite --size 416 --tiny --model yolov4 --video <name of your video file> --output <new name for output result video> --score 0.4- You can then navigate to the output video file and play it with VLC in the remote desktop environment or download the video to play locally.
- If you wish to start from this point (do not have a trained model) please refer to the releases (v0.1) for the
.tflitemodel,obj.namesfile, anchors (in notes) and sample video (.mkvfile) to create your RTSP server for simulation: https://github.com/michhar/yolov4-darknet-notes/releases/tag/v0.1.
On your development machine you will need the following.
gitcommand line tool or client such as GitHub Desktop- SCP client or command line tool - for Windows try pscp.exe
- A sample video in
.mkvformat (only some audio formats are supported so you may see an error regarding audio format - you may wish to strip audio in this case for the simulator) - Your
.tflitemodel, anchors andobj.namesfiles - Docker - such as Docker Desktop
- VSCode and Azure IoT Tools extension (search "Azure IoT Tools" in extensions withing VSCode)
- .NET Core 3.1 SDK - download
- Azure CLI - download and install
curlcommand line tool - download curl
On Azure:
- Have gone through the this Live Video Analytics quickstart and the Live Video Analytics cloud to device sample console app to set up the necessary Azure Resources and learn how to use VSCode to see the results with .NET app.
- OR have the following Azure resources provisioned:
- Create a custom RTSP simulator with your video for inferencing with LVA with live555 media server
- Clone the official Live Video Analytics GitHub repo:
git clone https://github.com/Azure/live-video-analytics.git - Open the repository folder in VSCode to make it easier to modify files
- Go to the RTSP simulator instructions:
cd utilities/rtspsim-live555/ - Replace line 21 with your
.mkvfile (can use the ffmpeg command line tool to convert from other formats like .mp4to.mkv) - Copy your
.mkvvideo file to the same folder as Dockerfile - Build the docker image according to the Readme
- Push the docker image to your ACR according to the Readme
- Login to ACR:
az acr login --name myregistry - Use docker to push:
docker push myregistry.azurecr.io/my-rtsp-sim:latest
- Login to ACR:
- Clone the official Live Video Analytics GitHub repo:
- To prepare the ML model wrapper code, from the base of the live-video-analytics folder:
- Go to the Docker container building instructions:
cd utilities/video-analysis/yolov4-tflite-tiny - Copy your
.tflitemodel into theappfolder - Perform the following changes to files for your custom scenario:
- In
app/core/config.py:- Update the
__C.YOLO.ANCHORS_TINYline to be the same as training Darknet - Update the
__C.YOLO.CLASSESto be./data/classes/obj.names
- Update the
- In
app/data/classesfolder:- Add your file called
obj.names(with your class names, one per line)
- Add your file called
- In
app/yolov4-tf-tiny-app.py- Update line 31 to use the name of your model
- Update line 45 to be
obj.namesinstead ofcoco.names
- In the
Dockerfile - We do not need to pull down the yolov4 base tflite model so delete line 19
- In
- Follow instructions here to build, test, and push to ACR the docker image:
- Go to the Docker container building instructions:
- To run the sample app and view your inference results:
- Clone the official Live Video Analytics CSharp sample app:
git clone https://github.com/Azure-Samples/live-video-analytics-iot-edge-csharp.git - In the
src/edgefolder, updateyolov3.template.jsonas follows.- Rename to
yolov4.template.json - Update (or ensure this is the case) the
runtimeat the beginning of the file looks like:"runtime": { "type": "docker", "settings": { "minDockerVersion": "v1.25", "loggingOptions": "", "registryCredentials": { "$CONTAINER_REGISTRY_USERNAME_myacr": { "username": "$CONTAINER_REGISTRY_USERNAME_myacr", "password": "$CONTAINER_REGISTRY_PASSWORD_myacr", "address": "$CONTAINER_REGISTRY_USERNAME_myacr.azurecr.io" } } } }- This section will ensure the deployment can find your custom
rtspsimandyolov4images in your ACR.
- This section will ensure the deployment can find your custom
- Change the
yolov3name toyolov4as in the following modules section (the image location is an example) pointing the yolov4 module to the correct image location in your ACR."yolov4": { "version": "1.0", "type": "docker", "status": "running", "restartPolicy": "always", "settings": { "image": "myacr.azurecr.io/my-awesome-custom-yolov4:latest", "createOptions": {} } } - For
rtspsimmodule ensure the image points to your image in ACR (the image location is an example) and ensure thecreateOptionslook as follows:"rtspsim": { "version": "1.0", "type": "docker", "status": "running", "restartPolicy": "always", "settings": { "image": "myacr.azurecr.io/my-rtsp-sim:latest", "createOptions": { "PortBindings": { "554/tcp": [ { "HostPort": "5001" } ] } } } } - Also, in the
rtspsimmodulecreateOptionsmake sure to delete the folder bindings, so delete any section like:"HostConfig": { "Binds": [ "$INPUT_VIDEO_FOLDER_ON_DEVICE:/live/mediaServer/media" ] }- This will ensure that LVA looks in the
rtspsimmodule for the video rather than on the IoT Edge device.
- This will ensure that LVA looks in the
- Rename to
- Make the appropriate changes to the
.envfile (this should be located in thesrc/edgefolder:- Update the
CONTAINER_REGISTRY_USERNAME_myacrandCONTAINER_REGISTRY_PASSWORD_myacr - Recall the
.envfile (you can modify in VSCode) should have the following format (fill in the missing parts for your Azure resources):SUBSCRIPTION_ID= RESOURCE_GROUP= AMS_ACCOUNT= IOTHUB_CONNECTION_STRING= AAD_TENANT_ID= AAD_SERVICE_PRINCIPAL_ID= AAD_SERVICE_PRINCIPAL_SECRET= INPUT_VIDEO_FOLDER_ON_DEVICE="/live/mediaServer/media" OUTPUT_VIDEO_FOLDER_ON_DEVICE="/var/media" APPDATA_FOLDER_ON_DEVICE="/var/lib/azuremediaservices" CONTAINER_REGISTRY_USERNAME_myacr= CONTAINER_REGISTRY_PASSWORD_myacr=- When you create the manifest template file in VSCode it will use these values to create the actual deployment manifest file.
- Update the
- In the
src/cloud-to-device-console-appfolder, make the appropriate changes to theoperations.json.- In the
"opName": "GraphTopologySet", update thetopologyUrlto be the http extension topology as follows.
{ "opName": "GraphTopologySet", "opParams": { "topologyUrl": "https://raw.githubusercontent.com/Azure/live-video-analytics/master/MediaGraph/topologies/httpExtension/topology.json" } }- In the
"opName": "GraphInstanceSet", update thertspUrlvalue to have your video file name (heremy_video.mkv) andinferencingUrlwith"value": "http://yolov4/score", as in:
{ "opName": "GraphInstanceSet", "opParams": { "name": "Sample-Graph-1", "properties": { "topologyName" : "InferencingWithHttpExtension", "description": "Sample graph description", "parameters": [ { "name": "rtspUrl", "value": "rtsp://rtspsim:554/media/my_video.mkv" }, { "name": "rtspUserName", "value": "testuser" }, { "name": "rtspPassword", "value": "testpassword" }, { "name": "imageEncoding", "value": "jpeg" }, { "name": "inferencingUrl", "value": "http://yolov4/score" } ] } } }, - In the
- Make the appropriate changes to the
appsettings.json, a file that you may need to create if you haven't done the quickstarts. It should look as follows and be located in thesrc/cloud-to-device-console-appfolder.{ "IoThubConnectionString" : "connection_string_of_iothub", "deviceId" : "name_of_your_edge_device_in_iot_hub", "moduleId" : "lvaEdge" }- The IoT Hub connection string may be found in the Azure Portal under your IoT Hub -> Settings -> Shared access policies blade -> iothubowner Policy -> Connection string—primary key
- Build the app with
dotnet buildfrom thesrc/cloud-to-device-console-appfolder. - Run the app with
dotnet run
- Clone the official Live Video Analytics CSharp sample app:
- Darknet Azure DSVM
- Visual Object Tagging Tool (VoTT)
- Darknet on GitHub
- Python virtual environments
- Conversion of Darknet model to TFLite on GitHub
- Create a movie simulator docker container with a test video for LVA
- TensorFlow Lite Darknet Python AI container sample for LVA
- Run LVA sample app locally