Note: Example problem and solution based on Dr. Mohm Azwan Bin Mohammad Hamza's assignment "Algorithm & Complexity"
An advanced and comprehensive implementation of the Closest Pair of Points problem using the Divide and Conquer algorithm in computational geometry. This project provides an in-depth exploration of efficient algorithms for solving proximity problems in a two-dimensional plane, focusing on optimizing performance for large-scale datasets.
- Introduction
- Problem Statement
- Case Study Questions
- Features
- Dataset
- Algorithm Implementation
- Step-by-Step Solution
- Time Complexity Analysis
- Usage
- Performance Evaluation
- Real-World Applications
- Critical Evaluation
- Conclusion
- License
Computational geometry is a foundational field in computer science that deals with algorithms and data structures for solving geometric problems. One of the classic challenges in this domain is determining the Closest Pair of Points in a two-dimensional plane. Efficiently solving this problem is crucial for applications such as GPS navigation systems, machine learning clustering algorithms, image processing, wireless network optimization, and robotics.
This project implements the Divide and Conquer algorithm to address the Closest Pair of Points problem effectively, reducing the time complexity from
The Closest Pair of Points problem involves finding the pair of points with the minimum Euclidean distance between them within a set of points on a 2D plane.
For any two points
This formula measures the "straight line" distance between two points in Euclidean space.
The brute-force method compares every possible pair of points to find the minimum distance, resulting in a time complexity of
Detailed answers to the following questions are available in the /doc
folder.
-
Problem Understanding
- Definition of the problem.
- Euclidean distance calculation.
- Inefficiency of the brute-force approach.
-
Divide and Conquer Approach
- Process explanation.
- The three main steps: Divide, Conquer, Combine.
- Implementation of the 'strip' optimization.
-
Solve the Problem
- Step-by-step solution using the provided dataset.
-
Time Complexity Analysis
- Comparison between
$O(n \log n)$ and$O(n^2)$ time complexities.
- Comparison between
-
Algorithm Implementation
- Code construction in Python.
- Execution on datasets containing 1,000, 10,000, and 100,000 points.
-
Performance Comparison
- Evaluation of the Divide and Conquer approach versus the brute-force method.
-
Real-World Applications
- Five practical applications of the Closest Pair of Points algorithm.
-
Critical Evaluation
- Strengths, limitations, and potential trade-offs of the algorithm.
- Efficient Algorithm Implementation: Leverages the Divide and Conquer strategy for optimal performance.
- Scalability: Effectively handles large datasets; tested with up to 100,000 points.
- Modular and Clean Code: Written in Python with emphasis on readability and maintainability.
- Comprehensive Documentation: Provides detailed explanations and analyses.
- Performance Analysis Tools: Includes scripts to compare and visualize algorithm performance.
The algorithm operates on a set of 2D points with associated labels. Below is the sample dataset:
points = [
("green", 2, 3),
("green", 5, 1),
("green", 6, 2),
("green", 7, 7),
("green", 20, 24),
("black", 3, 5),
("black", 13, 14),
("black", 27, 25),
("purple", 9, 6),
("purple", 12, 10),
("purple", 17, 21),
("purple", 18, 15),
("blue", 8, 15),
("blue", 19, 20),
("blue", 31, 33),
("blue", 40, 50),
("red", 12, 30),
("red", 22, 29),
("red", 25, 18),
("red", 35, 40),
]
A visual representation of this dataset helps understand the spatial distribution of points across the plane.
The project is implemented in Python, utilizing its robust libraries and simplicity for algorithm development.
-
Divide
- Sorting: Sort the points based on their x-coordinates.
- Partitioning: Divide the sorted set into two equal halves.
-
Conquer
-
Recursive Processing: Recursively apply the algorithm to both halves to find the minimum distances
$d_L$ and$d_R$ . - Base Case: When subsets are small (typically 2 or 3 points), use the brute-force method.
-
Recursive Processing: Recursively apply the algorithm to both halves to find the minimum distances
-
Combine
-
Find Minimal Distance: Determine the minimal distance
$d = \min(d_L, d_R)$ . -
Strip Creation: Build a strip of points within distance
$d$ from the central dividing line. - Strip Optimization: Sort the strip points based on y-coordinates and check for closer pairs.
-
Find Minimal Distance: Determine the minimal distance
- Purpose: Efficiently find the closest pair across the partition.
-
Implementation:
-
Create Strip: Collect points that lie within distance
$d$ of the dividing line. - Sort Strip: Sort these points based on their y-coordinates.
- Compare Points: For each point in the strip, compare it with the next seven points to find the closest pair.
-
Create Strip: Collect points that lie within distance
A detailed step-by-step solution for the provided dataset is available in the /doc
. It includes:
- Sorting Points: Demonstrates the initial sorting by x and y coordinates.
- Dividing Dataset: Explains how the dataset is recursively partitioned.
- Calculating Closest Pair: Shows how the closest pairs are found within subsets.
- Combining Results: Describes how the minimal distances are compared across the partitions.
-
Initial Sorting: Takes
$O(n \log n)$ time. -
Recursive Division: Creates
$\log n$ levels of recursion. -
Combine Step: At each level, the strip processing takes
$O(n)$ time. -
Overall Complexity: The combination of sorting and recursive steps results in
$O(n \log n)$ time complexity.
- Pairwise Comparison: Every pair of points is compared.
-
Total Comparisons:
$\frac{n(n - 1)}{2}$ comparisons, leading to quadratic time growth.
The Divide and Conquer method substantially outperforms the brute-force approach for large
Ensure that Python 3.x is installed on your system.
-
Clone the Repository
git clone https://github.com/IRedDragonICY/cb24153-closestpairofpoints.git
-
Navigate to the Project Directory
cd cb24153-closestpairofpoints
-
Create a Virtual Environment (Optional)
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
To run the algorithm, you need to execute three Python scripts: case-bigdataset.py
, case-plot.py
, and case-plot-step.py
. Follow the steps below to ensure everything runs smoothly:
-
Ensure Dependencies are Installed
pip install -r requirements.txt
-
Run the Scripts
python src/case-bigdataset.py python src/case-plot.py python src/case-plot-step.py
Each script has a specific role:
case-bigdataset.py
: Evaluates the performance of the brute force and divide & conquer methods on larger datasets.case-plot.py
: Plots the performance results of the algorithms.case-plot-step.py
: Provides a step-by-step visualization of the algorithm's execution.
By following these steps, you will be able to run the algorithm and visualize the results effectively.
Number of Points | Brute Force Distance | Brute Force Time (s) | Divide & Conquer Distance | Divide & Conquer Time (s) |
---|---|---|---|---|
1,000 | 952.135495 | 0.178031 | 952.135495 | 0.003254 |
10,000 | 30.413813 | 14.760878 | 30.413813 | 0.031042 |
100,000 | 11.401754 | 1465.536086 | 11.401754 | 0.475594 |
- Efficiency Gains: The Divide and Conquer algorithm dramatically reduces computation time.
- Scalability: Maintains reasonable execution times even as the dataset size increases significantly.
- Practically Feasible: The brute-force method becomes impractical for datasets larger than 10,000 points.
The graph illustrates the execution time vs. number of points for both algorithms.
-
GPS Navigation Systems
- Nearest Neighbor Search: Finding the closest service location.
- Route Optimization: Calculating the shortest path by evaluating proximity between waypoints.
-
Machine Learning Clustering
- Data Analysis: Grouping similar data points for pattern recognition.
- Anomaly Detection: Identifying outliers based on distance metrics.
-
Image Processing
- Feature Matching: Aligning images by matching key points.
- Object Recognition: Detecting and classifying objects within images.
-
Wireless Network Optimization
- Network Design: Placing transmitters for optimal coverage.
- Signal Strength Mapping: Analyzing proximity to access points.
-
Robotics
- Obstacle Avoidance: Real-time detection of nearby obstacles.
- Path Planning: Determining efficient paths to targets.
- High Efficiency: Significantly reduces computation time for large datasets.
- Wide Applicability: Useful in various domains requiring proximity calculations.
- Foundational Knowledge: Enhances understanding of computational geometry techniques.
- Algorithm Complexity: More complex to implement compared to the brute-force method.
- Dimensionality: Efficiency gains diminish in higher-dimensional spaces.
- Development Time: More time needed for coding and debugging.
- Memory Usage: Increased memory usage due to recursion and additional data structures.
The Divide and Conquer algorithm offers a powerful and efficient solution to the Closest Pair of Points problem, especially for large datasets. Its implementation in this project demonstrates significant performance improvements over the brute-force method, making it highly suitable for real-world applications that demand quick and reliable proximity calculations.
This project is licensed under the MIT License. See the LICENSE file for detailed information.
For comprehensive answers to the case study questions, theoretical explanations, and additional resources, please refer to the /doc
folder.
Empowering efficient computational geometry solutions for complex proximity problems.