A high-performance computer vision suite focusing on classical morphological analysis and deterministic tracking algorithms.
Engineered to solve complex occlusion and segmentation challenges without the computational overhead of Deep Learning neural networks.
This repository hosts a robust computer vision engine developed to demonstrate the efficacy of Classical Image Processing in constrained environments. Unlike "Black Box" DL models, this solution offers full interpretability and low-latency execution.
The system features two distinct algorithmic pipelines:
- Static Scene Analysis (Segmentation Engine): An adaptive multi-stage pipeline for counting biological entities in high-noise environments, featuring Spectral Signature Exclusion and Euclidean Clustering.
- Dynamic Scene Analysis (Temporal Tracking): A custom Centroid Tracking algorithm with ID persistence and virtual event triggering, designed for real-time video sequences.
The primary challenge was to isolate specific targets (cats) from a cluttered background containing distractors (birds, dogs) and high-variance textures (grass, gravel).
The system implements an Adaptive Decision Loop. It attempts a standard extraction first; if noise levels are statistically improbable (contours > 12), it hot-swaps to a Refined Method with aggressive Gaussian blurring and selective dilation.
graph TD;
Input[Raw Image] --> HSV[Color Space Conversion];
HSV --> Masking[Background Suppression];
Masking --> Decision{Contour Density Check};
Decision --"Low Noise"--> Morph1[Standard Morphological Ops];
Decision --"High Noise"--> Morph2[Gaussian Blur + Aggressive Dilation];
Morph1 --> Cluster[Euclidean Distance Clustering];
Morph2 --> Cluster;
Cluster --> Spectral[Spectral Filtering];
Spectral --> Output[Final Count];
To reconstruct fragmented objects (e.g., separating a cat's tail from its body), the system applies a heavy 15x15 Kernel Dilation, effectively "bridging" disjointed binary blobs. This is followed by Erosion to restore approximate geometric fidelity.
Standard contour detection often fragments a single object into multiple artifacts. I implemented a custom Pairwise Euclidean Aggregation algorithm:
- Iterates through all candidate contours (
$O(N^2)$ complexity). - Calculates the L2 Norm (distance) between constituent points.
-
Logic: If
$Distance(C_1, C_2) < Threshold$ , the contours are fused into a single geometric entity usingnp.vstack.
To distinguish the target class from distractors, the system performs a Histogram Analysis within the bounding box of each candidate:
-
Avian Exclusion: Contours with
$>30%$ density in the Purple/Pink HSV range are rejected. - Canine Exclusion: Contours with dominant Blue HSV signatures are discarded.
This module addresses the problem of Object Permanence—maintaining identity across frames in a video stream (The "Goomba Counter").
Instead of using heavy Kalman Filters, a lightweight Deterministic Centroid Tracker was engineered for
sequenceDiagram
participant Frame as Video Frame
participant Det as Detector
participant Mem as Object Memory
Frame->>Det: ROI Crop & HSV Threshold
Det->>Det: Extract Centroids (Cx, Cy)
Det->>Mem: Query Active IDs
Mem->>Mem: Calculate Euclidean Distance Matrix
alt Distance < 25px
Mem->>Mem: Update Object Position
Mem->>Mem: Reset TTL (Time-To-Live)
else Distance > Threshold
Mem->>Mem: Register New Object ID
end
Mem->>Mem: Increment TTL for Missing Objects
Mem-->>Det: Return Locked IDs
Processing full HD frames is computationally expensive. The system programmatically crops the active gameplay area (470:810, 90:550), reducing the pixel processing load by ~70%.
The system maintains a local registry (tracked_objects dictionary) storing the state of every entity: ID: {Position, Crossed_Flag, Age}.
- Persistence: A Time-To-Live (TTL) counter handles temporary occlusions. If an object vanishes for < 5 frames (e.g., flickering), its ID is reserved. If
Age > Tolerance, it is deregistered.
Counting is not done by simple detection, but by Vector Analysis.
- A virtual vertical line is defined at
$x = 10%$ of ROI width. -
Debouncing: A boolean flag (
crossed) is attached to each object ID. The counter increments only on the first traversal event (False -> True), preventing double-counting due to jitter.
The system's precision is quantified using Mean Absolute Error (MAE) against a manually labeled ground truth dataset.
- Result: The pipeline demonstrates high robustness with minimal deviation from ground truth, validating the heuristic parameter tuning.
The project utilizes the standard Python scientific stack for matrix operations and computer vision.
pip install opencv-python numpy pandasThe architecture is modularized into two distinct directories. To execute a specific pipeline, navigate to the respective module directory and trigger the entry point script.
1. Static Image Counter (Module I) Navigates to the static analysis module and initiates the segmentation engine on the local dataset.
cd cat_counting
python counting.py ./data/2. Video Tracking System (Module II) Navigates to the dynamic tracking module and executes the centroid tracking algorithm on video sequences.
cd video_counting
python gooba_counting.py ./data/The codebase is organized into isolated modules, ensuring separation of concerns between static segmentation logic and temporal tracking algorithms.
.
├── cat_counting/ # Module I: Static Scene Analysis
│ ├── data/ # Test artifacts (Raw images & Ground truth)
│ └── counting.py # Core segmentation pipeline entry point
│
├── video_counting/ # Module II: Dynamic Event Tracking
│ ├── data/ # Video sequences & CSV validation metrics
│ └── gooba_counting.py # Centroid tracking engine & Event logic
│
└── README.md # System documentation