Skip to content

ChoiSeongHo-h/Aligner

Repository files navigation

Aligner

Overview

Circuit Aligner with CUDA

  • Rotation-invariant template matching considering GPU architecture
  • Providing a web frontend with Django
  • Available at 1/10 the price of traditional products

1 Rotation-invariant template matching considering GPU architecture

Accelerating rotation-invariant template matching with CUDA parallel algorithms while considering GPU architecture

1.1 Motivation

image

  • Red crosshairs: Characteristic parts of aligned circuits saved in the aligner
  • Blue crosshairs: Characteristic parts of the misaligned circuit that entered the aligner

Rotation-invariant template matching is necessary to compute the amount of rotation and translation of a circuit.

The user wants to align the misaligned circuit by rotating and translating it in two dimensions. The aligned circuit already has the characteristic patches of the aligned circuit stored. The aligner finds the same patches in the misaligned circuit and calculates the amount of rotation and translation of the circuit based on the degree of misalignment between the patches.

1.2 Template matching overview

image

Template matching performs the following coarse to fine search:

  1. Coarse search: Fast search based on pixel statistics, taking advantage of the GPU architecture.
  2. Fine search: Precise search using circular sum

1.3 Step 1: Coarse Search

Fast search based on pixel statistics, taking advantage of the GPU architecture. A coarse search is performed as follows:

  1. Find the 1st-4th order moments of a reference patch.
  2. Construct an ROI around a reference patch in a misaligned circuit image.
  3. Traverse all pixels in the ROI to calculate the 1st-4th order moments. The region where the moments are calculated is centred on the pixel being traversed and has the same size as the reference patch.
  4. Compare the reference patch moment vector to the moment vector of all patches in the ROI.
  5. Extract candidate regions using an adaptive threshold.

1.3.1 1st to 4th Order Moments

image

The 1st to 4th order moments represent the mean, variance, skewness, and kurtosis of the pixels in the patch. These features are invariant to rotation or translation of the patch. For example, a patch with a mean of 3 will still have a mean of 3 if you move it slightly to the right.

1.3.2 Grid, Block, and Thread Structures and Shared Memory

image

The unit for performing kernels (functions) on the GPU is a grid. A grid is made up of blocks. Blocks are composed of threads. Threads within the same block can access shared memory and share data. Shared memory is very fast, so it should be actively used to improve performance.

image

In Step 1, the aligner's grid consists of (columns/16 x rows/16) blocks. Each block consists of (16 x 16) threads. Each thread is responsible for one patch to compute moments.

Within each block, threads share a pixel area of (16+margin x 16+margin). Due to the limitations of the Jetson Nano, it is not possible to allocate a large amount of shared memory.

For speed, all GPU memory, including shared memory, is pre-allocated and the image is uploaded.

1.4 Step 2: Fine Search

image

The aligner accumulates pixel values along a circular path. Each radius of the circle has a cumulative value. With n radiuses, an n-dimensional vector of accumulated values can be constructed. This vector encodes the rotation-invariant features of the image.

  1. Find the circular sum of the reference patches.
  2. Perform a circular sum in parallel similar to Step1 on the candidates that passed the adaptive threshold in Step1. Each thread is responsible for two circular paths.
  3. Apply NCC between the reference and candidate vectors to pick the pixel with the maximum value.

1.4.1 Algorithms for accelerating parallel computing

As mentioned in Step 2, each thread is responsible for two circular paths. One thread is responsible for the k th circle and the n-k th circle to reduce bottlenecks between threads.

Additionally, a reduction algorithm is used to extract maxima and sums to accelerate the process.

2 Providing a web frontend with Django

image

Django provides a web interface. Users can access the aligner from any device, not just a PC. In addition to PC, the responsive web is also conveniently accessible on mobile.

The sorter backend and Django frontend send and receive data through shared memory. For image sharing, a reference-only object is used to minimise latency.

3 Available at 1/10 the price of traditional products

At the time the aligner was developed, previous circuit alignment PCs cost around $1,000. However, this aligner, based on Jetson Nano, costs around $100. Despite the low price, there is no reduction in performance (search time < 50ms).

4 File Description

  • AlignerLauncher: main()
  • Aligner: sets up and tears down shared memory, sets up and tears down cameras, processes images, and communicates with the web.
  • PatternMatching: Manages GPU memory, runs the CUDA kernel, and performs image processing using OpenCV.
  • CudaSupporter: Image processing, finding the moment vector and circular sum vector for each pixel using CUDA.
  • Grabber: Pylon camera classes, setting up and destroying cameras and grabbers
  • MemorySharing: Sharing memory with the web.
  • Webcam: Enables the webcam if there is no Pylon camera.
  • AlignerConsts: Constants
  • kbhit: checks for key presses

5 Presentation slides

슬라이드1 슬라이드2 슬라이드3 슬라이드4 슬라이드5 슬라이드6 슬라이드7 슬라이드8 슬라이드9 슬라이드10 슬라이드11 슬라이드12 슬라이드13 슬라이드14 슬라이드15 슬라이드16 슬라이드17 슬라이드18 슬라이드19 슬라이드20 슬라이드21 슬라이드22 슬라이드23 슬라이드24 슬라이드25 슬라이드26 슬라이드27 슬라이드28 슬라이드29

About

circuit alignment software with CUDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published