CS4186 Instance Search

This is a project for the CS4186 Computer Vision and Image Processing course at City University of Hong Kong.

The project is a simple instance search system that uses multiple image retrieval methods (including SIFT, LBP, CNN and ensembled methods) to search for objects in a database of images.

Setup

pip install -r requirements.txt

If you are using a GPU, make sure to install the correct version of PyTorch. You can find the installation instructions here, or check for previous versions.

If you are on a server without a display, you may encounter the following error when trying to import OpenCV:

  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/cv2/__init__.py", line 153, in bootstrap
    native_module = importlib.import_module("cv2")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python/3.12.1/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

You can fix it by installing the following package:

pip install opencv-python-headless

Project Structure

.
├── dataset
│   ├── gallery_4186 (.jpg * 5000)
|   ├── query_img_4186 (.jpg * 20)
│   └── query_img_box_4186 (.jpg * 20)
├── output
│   ├── brisk/rankList.txt
|   ├── color_histogram/rankList.txt
│   ├── ...
|   └── demo
|       ├── brisk (.png * 20)
|       ├── color_histogram (.png * 20)
|       └── ...
├── brisk.py
├── color_histogram.py
├── ...
├── main.py
├── utils.py
|
├── README.md
└── requirements.txt

dataset/: Contains the dataset used for the project. Gallery set contains 5000 images, and query set contains 20 images.
- query_img_box_4186/: Contains the bounding box of the query images. Each .txt file corresponds to the same name image in query_img_4186/ and contains the coordinates of the bounding box in the format x1 y1 w h.
output/: Contains the output of the project.
- Each method has a folder. rankList.txt: for each query image, all 5000 images in the gallery are ranked according to the similarity in descending order.
- demo/: Each method has a subfolder. For each query image, the query image and the top 10 matches are aggregated for visualization.
utils.py: Contains File I/O functions and image processing functions. Check this file if you wish to change the directory of the dataset.

Algorithms in a Nutshell

Color Histogram

Input: RGB or HSV image
Output: $\text{bins} \times \text{bins} \times \text{bins}$ histogram

Color histogram computes the distribution of colors in an image. Each pixel is assigned to a bin based on its RGB values.

$$\text{bins} = 32$$

$$ \text{bin}(r, g, b) = \left\lfloor \frac{r}{256/\text{bins}} \right\rfloor + \left\lfloor \frac{g}{256/\text{bins}} \right\rfloor \cdot \text{bins} + \left\lfloor \frac{b}{256/\text{bins}} \right\rfloor \cdot \text{bins}^2 $$

To compare two histograms, cosine similarity is used:

$$ \text{similarity}(h_1, h_2) = \frac{h_1 \cdot h_2}{||h_1|| \cdot ||h_2||} $$

where $h_1$ and $h_2$ are the histograms of the two images.

LBP

Input: Grayscale image
Output: 256-dimensional LBP histogram

Local Binary Pattern (LBP) is a texture descriptor that encodes the local structure of an image. It compares each pixel with its neighbors and assigns a binary value based on whether the neighbor is greater than the center pixel.

$$ \text{LBP}(x, y) = \sum_{i=0}^{7} 2^i \cdot \text{sgn}(I(x, y) - I(x + dx_i, y + dy_i)) $$

where $dx_i, dy_i$ are the offsets of the 8 neighbors.

In practice, a radius of 8 and 24 neighbors are used.

CNN

Input: RGB image (preprocessed according to ResNet50)
Output: 2048-dimensional feature vector

Convolutional Neural Networks (CNN) are a class of deep learning models that are particularly effective for image classification and object detection tasks. In this project, we use a pre-trained CNN model, ResNet50, to extract features from the images. The features are then used to compute the similarity between the query and gallery images.

The last fully connected layer of the ResNet50 model is removed, and the output of the last convolutional layer is used as the feature vector for each image. The feature vectors are then compared using cosine similarity.

SIFT

Input: Grayscale image
Output: 128-dimensional float descriptor

Scale-Invariant Feature Transform (SIFT) is a feature detection algorithm that extracts keypoints and descriptors from an image. The keypoints are invariant to scale and rotation, making them suitable for object recognition.

To compute the similarity between two images, the number of matched keypoints is used as a measure of similarity. The more keypoints that match, the more similar the images are.

In details, KNN matching is used to find the best matches between the keypoints of the query and gallery images. To filter out false matches, the ratio test is applied. The distance of the best match $D_1$ and the second best match $D_2$ are compared. If the ratio $D_1/D_2$ is less than a threshold (0.75), the match is considered valid. The number of valid matches is then used as the similarity score between the two images.

$$D_1/D_2 < 0.75 \Rightarrow \text{valid match}$$

BRISK

Input: Grayscale image
Output: 64-dimensional binary descriptor

BRISK (Binary Robust Invariant Scalable Keypoints) is a feature detection and description algorithm that is similar to SIFT but is designed to be faster and more efficient.

Similar to SIFT, KNN matching is used.

Ensemble (SIFT + BRISK)

Input: Grayscale image
Output: Both SIFT and BRISK descriptors (handled separately)

The ensemble method combines the scores from SIFT and BRISK to improve the overall performance of the instance search system.

However, they give descriptors of different dimensions. To combine them, we first normalize the scores of each method to the range [0, 1]. Then, we compute the final score as the weighted sum of the normalized scores:

$$ \text{final score} = \alpha \cdot \text{SIFT normalized} + (1 - \alpha) \cdot \text{BRISK normalized} $$

CLAHE

It can be observed that dark images are often considered as similar images to any given query image. This is because dark images have vague features and are often misclassified as similar images.

CLAHE (Contrast Limited Adaptive Histogram Equalization) is used to enhance the contrast of the images. It divides the image into small regions and applies histogram equalization to each region. This helps to improve the visibility of the features in the images, making it easier to match them.

In details, for grayscale images, CLAHE can be applied directly:

clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
clahe_image = clahe.apply(gray)

For color images, it is first converted to Lab color space, and CLAHE is applied to the L channel (lightness channel) only:

l, a, b = cv2.split(cv2.cvtColor(image, cv2.COLOR_BGR2Lab))
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
l = clahe.apply(l)
clahe_image = cv2.merge((l, a, b))
clahe_image = cv2.cvtColor(clahe_image, cv2.COLOR_Lab2BGR)

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
dataset		dataset
output		output
.gitignore		.gitignore
README.md		README.md
brisk.py		brisk.py
color_histogram.py		color_histogram.py
combined.py		combined.py
lbp.py		lbp.py
main.py		main.py
requirements.txt		requirements.txt
resnet50.py		resnet50.py
sift.py		sift.py
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS4186 Instance Search

Setup

Project Structure

Algorithms in a Nutshell

Color Histogram

LBP

CNN

SIFT

BRISK

Ensemble (SIFT + BRISK)

CLAHE

Reference

About

Uh oh!

Languages

mojimoon/CS4186_InstanceSearch

Folders and files

Latest commit

History

Repository files navigation

CS4186 Instance Search

Setup

Project Structure

Algorithms in a Nutshell

Color Histogram

LBP

CNN

SIFT

BRISK

Ensemble (SIFT + BRISK)

CLAHE

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages