This is a collection of foundational projects for anyone diving into computer vision.
Explore some of computer vision core concepts and hands-on projects through challenges.
Challenges are organized into levels:
- Level 0 - Zero/beginner: Getting Started with Basics
- Level 1 - Apprentice/intermediate: Hands-on Computer Vision with Deep Learning
- Level 2 - Hero: Large Vision Models (LVMs) from Image Generation, Inpainting, & More
- Level 3 - Advanced: Video Models Benchmarking (ongoing)
- Level 4 - Expert: Finetuning of VLMs (Vision Language Models) & LVMs (ongoing)
- Level 5 - Master: Multimodality (ongoing)
Important
In L1 and L2, we primarily leverage pre-trained models to ensure accessibility for everyone. This also allows us to explore a wider range of vision recognition tasks using different types of models while focusing on the model's performance and outcome.
graph LR
A[Image Acquisition] ==> B[Image Processing]
B ==> C[Feature Extraction]
C ==> D[Output, Interpretation & Analysis]
style A fill:#EEE,stroke:#333,stroke-width:4px
style B fill:#F88,stroke:#333,stroke-width:4px
style C fill:#4F4,stroke:#333,stroke-width:4px
style D fill:#33F,stroke:#333,stroke-width:4px
To install the dependency packages using either conda
or pip
:
Using conda:
- create a new conda environment
conda create --name cv-challenge
- Activate the newly created environment:
source activate cv-challenge # For bash/zsh
conda activate cv-challenge # For conda prompt/powershell
- Install dependencies from the requirements.txt file:
conda install --channel conda-forge --file requirements.txt
Using pip:
- Install dependencies from the requirements.txt file:
pip install -r requirements.txt
Project | Description | Notebooks | |
---|---|---|---|
[1] | Getting Started with Images | Load an image, display it, and apply basic transformations. | |
[2] | Basic Image Manipulation | Modify pixels, resizing, Flipping, Cropping, image annotations | |
[3] | Image Filtering & Restoration | Enhance or manipulate image features using filtering techniques. | |
[4] | Image Enhancement | Enhance using arithmetic & bitwise operations | |
[5] | Image Segmentation (Traditional) | segment images into regions or pixels that belong to different classes or categories | |
[6] | Feature Extraction & Alignment | Learn how to extract features from images using descriptors based on the nature of the features | |
[7] | Optical Character Recognition (OCR) | Learn how to recognize text in images or documents using libraries such as Tesseract, Pytesseract, or EasyOCR |
Project | Description | Notebooks | |
---|---|---|---|
[1] | MNIST Handwritten Digit Recognition | Train a simple neural network to classify handwritten digits from the MNIST dataset. | |
[2] | CIFAR-10 Image Classification | Utilize convolutional neural networks (CNNs) to classify images of different types of objects from the CIFAR-10 dataset. | |
[3] | Object Detection with YOLOv5 | Implement YOLOv5, a real-time object detection algorithm, to detect objects in images and videos. | |
[4] | Semantic Segmentation with DeepLabv3+ | Utilize DeepLabv3+, a semantic segmentation model, to segment images into different semantic categories. | |
[5] | Facial Recognition with OpenFace | Explore facial recognition using OpenFace, a facial recognition library, to identify individuals in images. | |
[6] | Object Tracking | Follow the movement of objects in a video sequence. | |
[7] | Human Pose Estimation | Estimate the pose of a person in an image or a video using OpenCV and a pre-trained model. |
Project | Description | Notebooks | |
---|---|---|---|
[1] | Creative Image Generation with GANs | Generate novel images of different styles using GANs. | |
[2] | Text-to-Image Synthesis with LLMs and Diffusion Models | Create realistic and creative images from text descriptions using LLMs and diffusion models. | |
[3] | AI-Powered Image Restoration and Enhancement | Restore and enhance images using AI methods. | |
[4] | Style Transfer with GANs and Image Processing | Transfer the artistic style of one image to another. | |
[5] | AI-Driven Image Captioning and Storytelling | Generate comprehensive and creative captions and stories from images using LLMs. | |
[6] | AI-Assisted Image Editing and Manipulation | Automate image editing and manipulation tasks using AI. | |
[7] | AI Image Recognition Benchmarks with SOTA Vision Models | Benchmark SOTA Vision Models on a variety of image recognition tasks, including image classification, object detection, ... |
Project | Description | Notebooks | |
---|---|---|---|
[1] | Video Generation & Captioning | Create realistic video content from text, generate descriptive text or subtitles for video content using AI models. | |
[2] | Facial Emotion Recognition | Automatically generate descriptive text or subtitles for video content using AI. | |
[3] | Motion Analysis | Analyze the motion and movement of objects in a video sequence techniques: tracking, optical flow, video detection, etc. | |
[4] | Video Segmentation | Divide video frames into meaningful segments or regions for analysis and processing. | |
[5] | Video Style Transfer | Apply artistic styles from one video or image to another video, transforming its visual appearance. | |
[6] | Video Restoration & Enhancement | Restore and enhance videos using AI methods. | |
[7] | Video Models Benchmarking | Benchmark SOTA Video Models on a variety of video recognition tasks, including video classification, object detection, etc. |
Most projects are written in Jupyter notebooks, you can run the directly using jupyter notebook/lab
or Colab
.
For projects with a main.py
file, run the command below:
python main.py
Roadmap:
flowchart BT
A(Level 0: Zero) --> B(Level 1: Intermediate)
A --> C(Level 2: Hero)
A --> D(Level 3: Advanced)
A --> E(Level 4: Expert)
A --> F(Level 5: Master)
style A fill:#fff,stroke:#333,stroke-width:2px
style B fill:#88f,stroke:#333,stroke-width:2px
style C fill:#8f8,stroke:#333,stroke-width:2px
style D fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
style E fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
style F fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
New levels:
- L3 - Advanced: Video Models Benchmarking
- L4 - Expert: Finetuning of VLMs (Vision Language Models) & LVMs
- L5 - Master: Multimodality
Upcoming Features:
Feature | Description | Status |
---|---|---|
Code Refactoring | Enhance code readability by cleaning, documenting, and integrating Gradio demos. | To-Do |
New Learning Levels | Introduce advanced levels: L3 - Video Models Benchmarking, L4 - Finetuning of VLMs (Vision Language Models) & LVMs, and L5 - Multimodality | To-Do |
Wiki Update | Document the new learning levels in the project Wiki. | To-Do |
Multilingual Support | Translate the README.md file into multiple languages (French, Spanish, etc.). | To-Do |
Edge Device Deployment | Explore code translation for deployment on edge devices using C++ or Rust. | To-Do |
Performance Enhancements | Investigate options to improve performance, including adding new datasets and supporting additional computer vision tasks. | To-Do |
Machine Learning Framework Integration | Integrate the project with popular machine learning frameworks. | To-Do |
We warmly welcome your contributions! Whether you're a seasoned developer or just starting out in Computer Vision, you can help us improve the project and make it more valuable to everyone.
How to contribute:
- Fork this repository and clone it to your local machine.
- Create a new branch with a descriptive name for your contribution.
- Add your code and files to the branch and commit your changes.
- Push your branch to your forked repository and create a pull request to the main repository.
- Wait for your pull request to be reviewed and merged.
Another way to get involved is by sponsoring the project.
Your support will help:
- Provide computational resources (This is a GPU Poor Project!!!) to explore new frontiers in computer vision by training larger and more complex model
- Keep the project up to date with the latest computer vision advancements
- Create more detailed tutorials for users at all skill levels
This project is licensed under the MIT LICENSE.