Welcome to the DeepLearningImplementation repository! This project provides clean, readable implementations of seminal deep learning architectures for computer vision. Whether you're a researcher, student, or practitioner, you'll find comprehensive implementations, training scripts, and documentation for some of the most influential models in the field.
We prioritize clarity and understanding over optimization. Our implementations focus on:
- Simplicity: Clean, straightforward code that's easy to follow
- Readability: Clear variable names, thorough comments, and structured organization
- Learning-Oriented: Focus on fundamental mechanisms for deeper understanding
- Minimal Dependencies: Built primarily with PyTorch for simplified setup
- [β ] AlexNet (2012)
- [β ] ZFNet (2013)
- [β ] GoogLeNet (2014)
- [β ] VGG16 (2015)
- [β ] ResNet (2015)
- [β ] Rethinked Inception (2015)
- [β ] DenseNet (2016)
- [β ] Xception (2016)
- [β ] SqueezeNet (2016)
- [β ] ResNeXt (2016)
- [β ] SENet (2017)
- [β ] MobileNet (2017)
- [β ] ShuffleNet (2017)
- [β ] Residual Attention Network (2017)
- [β ] MobileNetV2 (2018)
- [β ] EfficientNet (2019)
- [β ] VisionTransformer (2020)
- DeepViT (2021)
- Tokens-to-Token ViT (2021)
- CCT (2021)
- LeViT (2021)
- SwinTransformer (2021)
- MobileVIT (2021)
- Vision Transformer for Small-Size Datasets (2021)
- SepViT (2022)
- MaxViT (2022)
- Patch Merger (2022)
- ConvNet (2022)
- ConvNext V2 (2023)
- RepVIT (2023)
- VisionLSTM (2024)
- FCN (2014)
- SegNet (2015)
- [β ] UNet (2015)
- PSPNet (2016)
- DeepLab (2016)
- ENet (2016)
- Mask R-CNN (2017)
- DeepLabV3 (2017)
- ICNet (2018)
- [β ] Attention Unet (2018)
- HRNet (2019)
- OCRNet (2019)
- [β ] U-Net++ (2019)
- SegFormer (2021)
- Mask2Former (2022)
- RCNN (2014)
- Fast-RCNN (2015)
- Faster-RCNN (2015)
- YOLO (2015)
- SSD (2016)
- YOLO9000 (2016)
- RetinaNet (2017)
- YOLOv3 (2018)
- YOLOv4 (2020)
- [β ] GAN (2014)
- DCGAN (2015)
- InfoGAN (2016)
- Pix2Pix (2016)
- WGAN (2017)
- CycleGAN (2017)
- BigGAN (2018)
- StyleGAN (2018)
- StyleGAN2 (2019)
- 3D-R2N2 (2016)
- 3D-RecGAN (2017)
- 3D-GAN (2017)
- 3D-RecGAN++ (2018)
- AtlasNet (2018)
- Occupancy Networks (2018)
- DeepSDF (2019)
- NeRF (2020)
- [β ] SENet (2017)
- [β ] Residual Attention Network (2017)
- [β ] Attention Unet (2018)
- [β ] CBAM (2018)
- Python 3.8+
- PyTorch 1.8+
- CUDA-capable GPU (recommended)
- Clone the repository:
git clone https://github.com/yourusername/DeepLearningImplementation.git
cd DeepLearningImplementation- Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies for specific architecture:
cd Architectures/DesiredModel
pip install -r requirements.txtDeepLearningImplementation/
βββ Architectures/ # CNN architectures
β βββ AlexNet/
β β βββ README.md
β β βββ alexnet.py
β β βββ requirements.txt
β βββ ...
βββ SemanticSegmentation/
βββ ObjectDetection/
βββ GANs/
βββ LICENSE
βββ README.md
- Writing clear, understandable code for each model
- Providing basic documentation
- Setting foundation for further development
- Training models on relevant datasets
- Computing performance metrics
- Comparing model strengths and weaknesses
- Refining code implementations
- Enhancing documentation
- Adding detailed explanations and best practices
Contributions are welcome! Please feel free to submit issues or pull requests to help improve the implementations and documentation.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions, please open an issue or contact the repository maintainer.
Made with β€οΈ for the deep learning community