Skip to content

An advanced and comprehensive implementation of the Closest Pair of Points problem using the Divide and Conquer algorithm in computational geometry. This project provides an in-depth exploration of efficient algorithms for solving proximity problems in a two-dimensional plane, focusing on optimizing performance for large-scale datasets.

License

Notifications You must be signed in to change notification settings

IRedDragonICY/cb24153-closestpairofpoints

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Closest Pair of Points Algorithm in Computational Geometry

Closest Pair of Points Algorithm

Note: Example problem and solution based on Dr. Mohm Azwan Bin Mohammad Hamza's assignment "Algorithm & Complexity"

An advanced and comprehensive implementation of the Closest Pair of Points problem using the Divide and Conquer algorithm in computational geometry. This project provides an in-depth exploration of efficient algorithms for solving proximity problems in a two-dimensional plane, focusing on optimizing performance for large-scale datasets.

Table of Contents

Introduction

Computational geometry is a foundational field in computer science that deals with algorithms and data structures for solving geometric problems. One of the classic challenges in this domain is determining the Closest Pair of Points in a two-dimensional plane. Efficiently solving this problem is crucial for applications such as GPS navigation systems, machine learning clustering algorithms, image processing, wireless network optimization, and robotics.

This project implements the Divide and Conquer algorithm to address the Closest Pair of Points problem effectively, reducing the time complexity from $O(n^2)$ (as seen in the brute-force method) to $O(n \log n)$. This optimization is significant when handling large datasets where performance is a critical factor.

Problem Statement

Closest Pair of Points Problem

The Closest Pair of Points problem involves finding the pair of points with the minimum Euclidean distance between them within a set of points on a 2D plane.

Euclidean Distance Calculation

For any two points $P_1 (x_1, y_1)$ and $P_2 (x_2, y_2)$, the Euclidean distance $d$ is calculated as:

$$ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} $$

This formula measures the "straight line" distance between two points in Euclidean space.

Inefficiency of Brute-Force Approach

The brute-force method compares every possible pair of points to find the minimum distance, resulting in a time complexity of $O(n^2)$. This quadratic growth becomes computationally expensive and impractical for large datasets due to the substantial number of comparisons required.

Case Study Questions

Detailed answers to the following questions are available in the /doc folder.

  1. Problem Understanding

    • Definition of the problem.
    • Euclidean distance calculation.
    • Inefficiency of the brute-force approach.
  2. Divide and Conquer Approach

    • Process explanation.
    • The three main steps: Divide, Conquer, Combine.
    • Implementation of the 'strip' optimization.
  3. Solve the Problem

    • Step-by-step solution using the provided dataset.
  4. Time Complexity Analysis

    • Comparison between $O(n \log n)$ and $O(n^2)$ time complexities.
  5. Algorithm Implementation

    • Code construction in Python.
    • Execution on datasets containing 1,000, 10,000, and 100,000 points.
  6. Performance Comparison

    • Evaluation of the Divide and Conquer approach versus the brute-force method.
  7. Real-World Applications

    • Five practical applications of the Closest Pair of Points algorithm.
  8. Critical Evaluation

    • Strengths, limitations, and potential trade-offs of the algorithm.

Features

  • Efficient Algorithm Implementation: Leverages the Divide and Conquer strategy for optimal performance.
  • Scalability: Effectively handles large datasets; tested with up to 100,000 points.
  • Modular and Clean Code: Written in Python with emphasis on readability and maintainability.
  • Comprehensive Documentation: Provides detailed explanations and analyses.
  • Performance Analysis Tools: Includes scripts to compare and visualize algorithm performance.

Dataset

The algorithm operates on a set of 2D points with associated labels. Below is the sample dataset:

points = [
    ("green", 2, 3),
    ("green", 5, 1),
    ("green", 6, 2),
    ("green", 7, 7),
    ("green", 20, 24),
    ("black", 3, 5),
    ("black", 13, 14),
    ("black", 27, 25),
    ("purple", 9, 6),
    ("purple", 12, 10),
    ("purple", 17, 21),
    ("purple", 18, 15),
    ("blue", 8, 15),
    ("blue", 19, 20),
    ("blue", 31, 33),
    ("blue", 40, 50),
    ("red", 12, 30),
    ("red", 22, 29),
    ("red", 25, 18),
    ("red", 35, 40),
]

A visual representation of this dataset helps understand the spatial distribution of points across the plane.

Dataset Visualization

Algorithm Implementation

The project is implemented in Python, utilizing its robust libraries and simplicity for algorithm development.

Divide and Conquer Algorithm Steps

  1. Divide

    • Sorting: Sort the points based on their x-coordinates.
    • Partitioning: Divide the sorted set into two equal halves.
  2. Conquer

    • Recursive Processing: Recursively apply the algorithm to both halves to find the minimum distances $d_L$ and $d_R$.
    • Base Case: When subsets are small (typically 2 or 3 points), use the brute-force method.
  3. Combine

    • Find Minimal Distance: Determine the minimal distance $d = \min(d_L, d_R)$.
    • Strip Creation: Build a strip of points within distance $d$ from the central dividing line.
    • Strip Optimization: Sort the strip points based on y-coordinates and check for closer pairs.

Strip Optimization

  • Purpose: Efficiently find the closest pair across the partition.
  • Implementation:
    • Create Strip: Collect points that lie within distance $d$ of the dividing line.
    • Sort Strip: Sort these points based on their y-coordinates.
    • Compare Points: For each point in the strip, compare it with the next seven points to find the closest pair.

Step-by-Step Solution

A detailed step-by-step solution for the provided dataset is available in the /doc. It includes:

  • Sorting Points: Demonstrates the initial sorting by x and y coordinates.
  • Dividing Dataset: Explains how the dataset is recursively partitioned.
  • Calculating Closest Pair: Shows how the closest pairs are found within subsets.
  • Combining Results: Describes how the minimal distances are compared across the partitions.

Time Complexity Analysis

Divide and Conquer Algorithm ($O(n \log n)$)

  • Initial Sorting: Takes $O(n \log n)$ time.
  • Recursive Division: Creates $\log n$ levels of recursion.
  • Combine Step: At each level, the strip processing takes $O(n)$ time.
  • Overall Complexity: The combination of sorting and recursive steps results in $O(n \log n)$ time complexity.

Brute-Force Method ($O(n^2)$)

  • Pairwise Comparison: Every pair of points is compared.
  • Total Comparisons: $\frac{n(n - 1)}{2}$ comparisons, leading to quadratic time growth.

Comparative Analysis

The Divide and Conquer method substantially outperforms the brute-force approach for large $n$, as it reduces the number of necessary comparisons and scales logarithmically.

Usage

Installation

Ensure that Python 3.x is installed on your system.

  1. Clone the Repository

    git clone https://github.com/IRedDragonICY/cb24153-closestpairofpoints.git
  2. Navigate to the Project Directory

    cd cb24153-closestpairofpoints
  3. Create a Virtual Environment (Optional)

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  4. Install Dependencies

    pip install -r requirements.txt

Running the Algorithm

To run the algorithm, you need to execute three Python scripts: case-bigdataset.py, case-plot.py, and case-plot-step.py. Follow the steps below to ensure everything runs smoothly:

  1. Ensure Dependencies are Installed

    pip install -r requirements.txt
  2. Run the Scripts

    python src/case-bigdataset.py
    python src/case-plot.py
    python src/case-plot-step.py

    Each script has a specific role:

    • case-bigdataset.py: Evaluates the performance of the brute force and divide & conquer methods on larger datasets.
    • case-plot.py: Plots the performance results of the algorithms.
    • case-plot-step.py: Provides a step-by-step visualization of the algorithm's execution.

By following these steps, you will be able to run the algorithm and visualize the results effectively.

Performance Evaluation

Result Summary

Number of Points Brute Force Distance Brute Force Time (s) Divide & Conquer Distance Divide & Conquer Time (s)
1,000 952.135495 0.178031 952.135495 0.003254
10,000 30.413813 14.760878 30.413813 0.031042
100,000 11.401754 1465.536086 11.401754 0.475594

Observations

  • Efficiency Gains: The Divide and Conquer algorithm dramatically reduces computation time.
  • Scalability: Maintains reasonable execution times even as the dataset size increases significantly.
  • Practically Feasible: The brute-force method becomes impractical for datasets larger than 10,000 points.

Graphical Representation

Performance Comparison

The graph illustrates the execution time vs. number of points for both algorithms.

Real-World Applications

  1. GPS Navigation Systems

    • Nearest Neighbor Search: Finding the closest service location.
    • Route Optimization: Calculating the shortest path by evaluating proximity between waypoints.
  2. Machine Learning Clustering

    • Data Analysis: Grouping similar data points for pattern recognition.
    • Anomaly Detection: Identifying outliers based on distance metrics.
  3. Image Processing

    • Feature Matching: Aligning images by matching key points.
    • Object Recognition: Detecting and classifying objects within images.
  4. Wireless Network Optimization

    • Network Design: Placing transmitters for optimal coverage.
    • Signal Strength Mapping: Analyzing proximity to access points.
  5. Robotics

    • Obstacle Avoidance: Real-time detection of nearby obstacles.
    • Path Planning: Determining efficient paths to targets.

Critical Evaluation

Strengths

  • High Efficiency: Significantly reduces computation time for large datasets.
  • Wide Applicability: Useful in various domains requiring proximity calculations.
  • Foundational Knowledge: Enhances understanding of computational geometry techniques.

Limitations

  • Algorithm Complexity: More complex to implement compared to the brute-force method.
  • Dimensionality: Efficiency gains diminish in higher-dimensional spaces.

Potential Trade-Offs

  • Development Time: More time needed for coding and debugging.
  • Memory Usage: Increased memory usage due to recursion and additional data structures.

Conclusion

The Divide and Conquer algorithm offers a powerful and efficient solution to the Closest Pair of Points problem, especially for large datasets. Its implementation in this project demonstrates significant performance improvements over the brute-force method, making it highly suitable for real-world applications that demand quick and reliable proximity calculations.

License

This project is licensed under the MIT License. See the LICENSE file for detailed information.


For comprehensive answers to the case study questions, theoretical explanations, and additional resources, please refer to the /doc folder.


Empowering efficient computational geometry solutions for complex proximity problems.

About

An advanced and comprehensive implementation of the Closest Pair of Points problem using the Divide and Conquer algorithm in computational geometry. This project provides an in-depth exploration of efficient algorithms for solving proximity problems in a two-dimensional plane, focusing on optimizing performance for large-scale datasets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages