The task was to autonomously detect a square window and fly through the center of the window in a given environment autonomously.
The catch: The training for the detection model had to be done entirely on simulated data.
Check the Report for an in-depth explanation: Report.pdf
- Data Generation
- Data Augmentation and Training a DL network
- Deployment and Testing
- Window Pose Estimation
1. Data generation and Augmentation
-
For this task, I created a model of the window in Blender. By randomly spawning the window in different poses I could capture images to create my training data set.
-
Since the model was created in Blender, and I had the coordinates of the 4 corners of the window, I could easily extract them and use these as my ground truth labels and masks.
-
Since there is a square-shaped hole in the window, we used the "Domain Randomization" technique to add different backgrounds in each image to create a diverse dataset. For this, we used the "Flying chairs" dataset and created 3000 images for our training set.
2. Data Augmentation and Training a DL model
-
For data augmentation, I used Roboflow for adding noise, rotations, and random crops to the images and expanded the original training dataset from 3000 to 24000.
-
Using PyTorch and the YOLOv8 instance segmentation model, I trained our model to segment windows in the real world.
3. Deployment and Testing:
- Our drone was a DJI TelloEDU and we were using NVIDIA's JETSON Orin Nano as a computer. Even though we were using the YOLOv8-nano model, it was still too big as we were running multiple networks in parallel. We converted our segmentation model using Tensor RT SDK to optimize the runtime in real-time.
4. Pose Estimation:
- After testing the model on frames captured from the drone's camera, we got accurate segmentation masks and the four corners for the windows in the environments. The next step was to use OpenCV's SolvePnP function to get the pose in real world so the drone could be commanded to fly through the gate.