This repository is a collection of implementations of influential machine learning and deep learning papers. Each model is translated from theory to code, offering practical, executable versions of pioneering architectures. This project serves as both a learning exercise and a portfolio piece to demonstrate understanding and application of advanced models across various tasks, from generative modeling to object detection and classification.
Each model has its own directory containing the following:
model.py
: Model architecture.train.py
: Training loop and setup.inference.py
: Inference script for generating predictions.utils.py
: Utility functions specific to each model.README.md
: Detailed instructions and context for the implementation, including setup, usage, and results.
Model | Task | Dataset | Paper Reference | Highlights |
---|---|---|---|---|
CycleGAN | Image-to-Image Translation | Cezanne2Photo; Car2CarDamage | CycleGAN | Unpaired image translation |
DCGAN | Image Generation | MNIST | DCGAN | Deep Convolutional GAN for realistic image generation |
ESRGAN | Image Super-Resolution | Pre-trained | ESRGAN | Enhanced super-resolution GAN with high-quality outputs |
PointNet | 3D Object Recognition | ModelNet40 | PointNet | Processing 3D point clouds |
ProGAN | Image Generation | CelebA | ProGAN | Progressive growing GAN for high-resolution images |
VAE | Image Generation | MNIST | VAE | Variational Autoencoder for learning complex distributions |
VQGAN | Image Generation | Oxford Flowers | VQGAN | Combines GANs with vector quantization for high-quality images |
ViT | Image Classification | FoodVision Mini | ViT | Vision Transformers for image recognition |
WGAN / WGAN-GP | Stable GAN Training | MNIST | WGAN | Improved GAN training stability |
YOLOv3 | Object Detection | Pascal VOC | YOLOv3 | Real-time object detection model |
pix2pix | Image-to-Image Translation | Sat2Map | pix2pix | Paired image translation |
vgg_lpips | Image Similarity | Custom | LPIPS | Learned Perceptual Image Patch Similarity for comparing images |
Each model requires a specific environment configuration. Please refer to the requirements.txt
or environment.yml
file within each model’s directory for dependency information.
To set up a conda environment with the dependencies, use:
conda env create -f environment.yml
conda activate paper-to-code
Alternatively, install dependencies via pip:
pip install -r requirements.txt
Each model directory includes instructions for training and inference:
-
Training: To train a model, navigate to the corresponding directory and run:
python train.py
-
Inference: For inference on new data, use:
python inference.py --input_path <path_to_input> --output_path <path_to_output>
-
Visualization: Each model includes visualization tools to view outputs, such as generated images or bounding boxes.
- Documentation & Notebooks: Adding interactive Jupyter notebooks to walk through each model’s training and inference process.
- Additional Models: Expanding the repository to include:
- UNet for medical image segmentation.
- ResNet or EfficientNet as baseline image classifiers.
- DETR for object detection.
- StyleGAN for more sophisticated image generation.
- Comparison Table: Summarizing performance metrics, training times, and notable findings in a central location to showcase the breadth of experimentation and model capabilities.
This repository is open for contributions. If you have suggestions or improvements, please submit a pull request or reach out with feedback.
For any questions or feedback, please feel free to contact me via Email or at LinkedIn.