This repository is dedicated to exploring and understanding the development of models from scratch for various computer vision tasks. Diverging from my usual work with transfer learning and fine-tuning pre-trained models, this project focuses on building foundational knowledge and skills in constructing and training models from the ground up.
The project covers three primary vision tasks, implemented using both Convolutional Neural Networks (CNNs) and Transformer-based approaches:
- Image Classification
- Image Segmentation
- Object Detection
Each category is approached with models built and trained from scratch, facilitating a deep dive into the mechanics and capabilities of both CNNs and Transformers within the field of vision.
Under-Construction , The models and their corresponding tasks are being continuously updated and improved. Here's what's currently available and what to expect in future updates.
- CNN-Based Model for Image Classification: model that classify images into predefined categories.
- CNN-Based Model for Semantic Segmentation: Exploratory model using CNN architecture to perform semantic image segmentation.
- Object Detection Model: CNN-based model for precise object detection is currently being developed.
- Transformer-Based Models: Implementation of transformer-based models for the mentioned three tasks.
While the project is primarily an individual exploration, contributions, suggestions, and discussions are welcome. Feel free to open an issue or submit pull requests.
This project is open-sourced under the MIT license. See the LICENSE file for more details.