Skip to content

Hasanmog/CNN-VS-ViT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision Model Exploration: CNNs VS Transformers

Introduction

This repository is dedicated to exploring and understanding the development of models from scratch for various computer vision tasks. Diverging from my usual work with transfer learning and fine-tuning pre-trained models, this project focuses on building foundational knowledge and skills in constructing and training models from the ground up.

Project Scope

The project covers three primary vision tasks, implemented using both Convolutional Neural Networks (CNNs) and Transformer-based approaches:

  • Image Classification
  • Image Segmentation
  • Object Detection

Each category is approached with models built and trained from scratch, facilitating a deep dive into the mechanics and capabilities of both CNNs and Transformers within the field of vision.

Current Status

Under-Construction , The models and their corresponding tasks are being continuously updated and improved. Here's what's currently available and what to expect in future updates.

Available Models

  • CNN-Based Model for Image Classification: model that classify images into predefined categories.
  • CNN-Based Model for Semantic Segmentation: Exploratory model using CNN architecture to perform semantic image segmentation.

In Progress

  • Object Detection Model: CNN-based model for precise object detection is currently being developed.

Future

  • Transformer-Based Models: Implementation of transformer-based models for the mentioned three tasks.

Contribution

While the project is primarily an individual exploration, contributions, suggestions, and discussions are welcome. Feel free to open an issue or submit pull requests.

License

This project is open-sourced under the MIT license. See the LICENSE file for more details.