Modern Computer Vision with PyTorch, Second Edition

This is the code repository for Modern Computer Vision with PyTorch, Second Edition, published by Packt.

A practical roadmap from deep learning fundamentals to advanced applications and Generative AI

V Kishore Ayyadevara, Yeshwanth Reddy

About the book

Whether you are a beginner or are looking to progress in your computer vision career, this book guides you through the fundamentals of neural networks (NNs) and PyTorch and how to implement state-of-the-art architectures for real-world tasks.

The second edition of Modern Computer Vision with PyTorch is fully updated to explain and provide practical examples of the latest multimodal models, CLIP, and Stable Diffusion.

You’ll discover best practices for working with images, tweaking hyperparameters, and moving models into production. As you progress, you'll implement various use cases for facial keypoint recognition, multi-object detection, segmentation, and human pose detection. This book provides a solid foundation in image generation as you explore different GAN architectures. You’ll leverage transformer-based architectures like ViT, TrOCR, BLIP2, and LayoutLM to perform various real-world tasks and build a diffusion model from scratch. Additionally, you’ll utilize foundation models' capabilities to perform zero-shot object detection and image segmentation. Finally, you’ll learn best practices for deploying a model to production.

By the end of this deep learning book, you'll confidently leverage modern NN architectures to solve real-world computer vision problems.

Key Learnings

Get to grips with various transformer-based architectures for computer vision, CLIP, Segment-Anything, and Stable Diffusion, and test their applications, such as in-painting and pose transfer
Combine CV with NLP to perform OCR, key-value extraction from document images, visual question-answering, and generative AI tasks
Implement multi-object detection and segmentation
Leverage foundation models to perform object detection and segmentation without any training data points
Learn best practices for moving a model to production

Chapters

Chapters	Colab	Kaggle	Gradient	Studio Lab
Chapter 1: Artificial Neural Network Fundamentals
Back_propagation.ipynb
Chain_rule.ipynb
Feed_forward_propagation.ipynb
Gradient_descent.ipynb
Learning_rate.ipynb
Chapter 2: PyTorch Fundamentals
Auto_gradient_of_tensors.ipynb
Building_a_neural_network_using_PyTorch_on_a_toy_dataset.ipynb
Fetching_values_of_intermediate_layers.ipynb
Implementing_custom_loss_function.ipynb
Initializing_a_tensor.ipynb
Numpy_Vs_Torch_object_computation_speed_comparison.ipynb
Operations_on_tensors.ipynb
Sequential_method_to_build_a_neural_network.ipynb
Specifying_batch_size_while_training_a_model.ipynb
save_and_load_pytorch_model.ipynb
Chapter 3: Building a Deep Neural Network with PyTorch
Batch_normalization.ipynb
Impact_of_building_a_deeper_neural_network.ipynb
Impact_of_dropout.ipynb
Impact_of_regularization.ipynb
Inspecting_color_images.ipynb
Inspecting_grayscale_images.ipynb
Learning_rate_annealing.ipynb
Preparing_our_data.ipynb
Scaling_the_dataset.ipynb
Steps_to_build_a_neural_network_on_FashionMNIST.ipynb
Varying_batch_size.ipynb
Varying_learning_rate_on_non_scaled_data.ipynb
Varying_learning_rate_on_scaled_data.ipynb
Varying_loss_optimizer.ipynb
Chapter 4: Introducing Convolutional Neural Networks
CNN_on_FashionMNIST.ipynb
CNN_working_details.ipynb
Cats_Vs_Dogs.ipynb
Data_augmentation_with_CNN.ipynb
Image_augmentation.ipynb
Issues_with_image_translation.ipynb
Time_comparison_of_augmentation_scenario.ipynb
Visualizing_the_filters'_learning.ipynb
Chapter 5: Transfer Learning for Image Classification
2D_and_3D_facial_keypoints.ipynb
Facial_keypoints_detection.ipynb
Implementing_ResNet18_for_image_classification.ipynb
Implementing_VGG16_for_image_classification.ipynb
Resnet_block_architecture.ipynb
VGG_architecture.ipynb
age_gender_prediction.ipynb
age_gender_torch_snippets.ipynb
Chapter 6: Practical Aspects of Image Classification
Class_activation_maps.ipynb
Road_sign_detection.ipynb
Chapter 7: Basics of Object Detection
Calculating_intersection_over_union.ipynb
Training_Fast_R_CNN.ipynb
Training_RCNN.ipynb
Understanding_selectivesearch.ipynb
Chapter 8: Advanced Object Detection
Training_YOLO_v8.ipynb
Training_Faster_RCNN.ipynb
Training_SSD.ipynb
Training_YOLO.ipynb
Chapter 9: Image Segmentation
Instance_Segmentation.ipynb
Semantic_Segmentation_with_U_Net.ipynb
predicting_multiple_instances_of_multiple_classes.ipynb
Chapter 10: Applications of Object Detection and Segmentation
Human_pose_detection.ipynb
Image_colorization.ipynb
Multi_object_segmentation.ipynb
action_recognition.ipynb
crowd_counting.ipynb
Chapter 11: Autoencoders and Image Manipulation
Generating_deep_fakes.ipynb
VAE.ipynb
adversarial_attack.ipynb
conv_auto_encoder.ipynb
neural_style_transfer.ipynb
simple_auto_encoder_with_different_latent_size.ipynb
Chapter 12: Image Generation Using GANs
Face_generation_using_Conditional_GAN.ipynb
Face_generation_using_DCGAN.ipynb
Handwritten_digit_generation_using_GAN.ipynb
Chapter 13: Advanced GANs to Manipulate Images
Customizing_StyleGAN2.ipynb
CycleGAN.ipynb
Image_super_resolution_using_SRGAN.ipynb
pix2pix.ipynb
Chapter 14: Combining Computer Vision and Reinforcement Learning
Building_Q_table.ipynb
Deep_Q_Learning_Cart_Pole_balancing.ipynb
Pong_Deep_Q_Learning_with_Fixed_targets.ipynb
Understanding_the_Gym_environment.ipynb
train-self-driving-agent.ipynb
Chapter 15: Combining Computer Vision and NLP Techniques
LayoutLMv3_passports.ipynb
Handwriting_transcription.ipynb
Image_captioning.ipynb
Object_detection_with_DETR.ipynb
TrOCR_fine_tuning.ipynb
ViT_Image_classification.ipynb
Visual_Question_answering.ipynb
self-attention.ipynb
transformers-from-scratch.ipynb
Chapter 16: Foundation Models in Computer Vision
CLIP_from_scratch.ipynb
Conditional_Diffuser_training.ipynb
Diffusion_Pytorch.ipynb
FastSAM.ipynb
ImageBind.ipynb
OpenAI_clip.ipynb
SAM.ipynb
SAMTrack.ipynb
Stable_Diffusion_pipeline.ipynb
Unet_Components_from_scratch.ipynb
Chapter 17: Applications of Stable Diffusion
ControlNet-Inference.ipynb
Depth-to-Image.ipynb
Image-Inpainting.ipynb
SDXL-Turbo.ipynb
Text-Image-to-Video.ipynb
Chapter 18: Moving a Model to Production
convert_to_onnx.ipynb
measuring_drift.ipynb
quantization.ipynb
vector_stores.ipynb
Chapter 19: Appendix

Requirements for this book

Chapter	Software required	OS required
1 - 18	Minimum 8 GB RAM, Intel i5 processor or better	Windows, Mac OS X, and Linux (Any)
	NVIDIA 8+ GB graphics card – GTX1070 or better
	Minimum 50 Mbps internet speed
	Python 3.6 and above
	PyTorch 1.7
	Google Colab (can run in any browser)

Get to know Authors

V Kishore Ayyadevara Kishore Ayyadevara is an entrepreneur and a hands-on leader working at the intersection of technology, data, and AI to identify and solve business problems. With over a decade of experience in leadership roles, Kishore has established and grown successful applied data science teams at American Express and Amazon, as well as a top health insurance company. In his current role, he is building a start-up focused on making AI more accessible to healthcare organizations. Outside of work, Kishore has shared his knowledge through his five books on ML/AI, is an inventor with 12 patents, and has been a speaker at multiple AI conferences.

Yeshwanth Reddy Yeshwanth Reddy is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He has also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
Chapter01		Chapter01
Chapter02		Chapter02
Chapter03		Chapter03
Chapter04		Chapter04
Chapter05		Chapter05
Chapter06		Chapter06
Chapter07		Chapter07
Chapter08		Chapter08
Chapter09		Chapter09
Chapter10		Chapter10
Chapter11		Chapter11
Chapter12		Chapter12
Chapter13		Chapter13
Chapter14		Chapter14
Chapter15		Chapter15
Chapter16		Chapter16
Chapter17		Chapter17
Chapter18		Chapter18
Extra Chapters from First Edition		Extra Chapters from First Edition
.gitignore		.gitignore
Install-CUDA-Drivers.md		Install-CUDA-Drivers.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern Computer Vision with PyTorch, Second Edition

A practical roadmap from deep learning fundamentals to advanced applications and Generative AI

About the book

Key Learnings

Chapters

Requirements for this book

Get to know Authors

Other Related Books

About

Releases

Packages

Contributors 4

Languages

License

PacktPublishing/Modern-Computer-Vision-with-PyTorch-2E

Folders and files

Latest commit

History

Repository files navigation

Modern Computer Vision with PyTorch, Second Edition

A practical roadmap from deep learning fundamentals to advanced applications and Generative AI

About the book

Key Learnings

Chapters

Requirements for this book

Get to know Authors

Other Related Books

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages