🧠 AI Projects: NLP & Computer Vision

This repository is a curated collection of AI projects spanning Computer Vision, Natural Language Processing (NLP), and Multi-Modal AI. Each project is implemented in a self-contained Jupyter Notebook (or script) with explanations, code, and results.

🔗 Useful Links:

📊 Project Summary (Skills Matrix)

Project	Domain	Key Skills / Techniques	Tools & Frameworks
Face Mask Detection with VGG16	CV	Transfer Learning, Binary Classification, Data Augmentation	TensorFlow, Keras, VGG16
Facial Emotion Recognition with VGG16	CV	Multi-class Classification, Emotion Recognition, Transfer Learning	Keras, VGG16
Fashion MNIST Classification with CNNs	CV	CNN Architectures, Model Comparison, Feature Visualization	TensorFlow, Keras
Green Screening with OpenCV	CV	Chroma Keying, Real-time Video Processing	OpenCV, NumPy
Image Deblurring with VGG16 + DCGAN	CV	GANs, Perceptual Loss, Image Restoration	DCGAN, VGG16, TensorFlow
Image Captioning with Flickr30k	CV + NLP	Encoder-Decoder, Seq2Seq, BLEU Evaluation	VGG16, LSTM, Keras
Tweets Sentiment Analysis (3 Neural Nets)	NLP	Sentiment Analysis, DNN/CNN/RNN Comparison, Embeddings	TensorFlow, Keras, GloVe
GenZ Tweets Data Pipeline	NLP	Text Preprocessing, Regex, Lemmatization, Emoji Normalization	NLTK, SpaCy, Python
Next Word Prediction with Bi-LSTM	NLP	Language Modeling, Sequence Prediction, Perplexity	TensorFlow, Keras, Bi-LSTM
Prompt-to-Synopsis Generator	NLP	Fine-Tuning Transformers, Creative Text Generation	HuggingFace, GPT-2
AI Long-Form Story Generator	NLP	Long-Context Modeling, Story Generation	HuggingFace, Transformers
AI Imagining Stories from Images	Multi-Modal	Image-to-Text, Vision+Language, Storytelling	HuggingFace, Transformers

📂 Projects

🖼️ Computer Vision

1. Face Mask Detection with VGG16

Situation: During COVID-19, monitoring mask compliance became critical in public spaces.
Task: Build a system to automatically detect masks from images of people.
Action:
- Fine-tuned VGG16 (transfer learning) pretrained on ImageNet.
- Applied data augmentation (rotation, flipping, zoom) for robustness.
- Built a binary classifier on labeled mask/no-mask dataset.
Result: Achieved 97% accuracy on validation data, demonstrating production feasibility for surveillance and healthcare use cases.
Tags: TensorFlow · Keras · Transfer Learning · CNN · Image Classification · Model Deployment

2. Facial Emotion Recognition with VGG16

Situation: Emotion recognition is important in mental health monitoring, human-computer interaction, and customer analytics.
Task: Develop a model to classify facial images into multiple emotions.
Action:
- Preprocessed FER-2013 dataset with grayscale normalization & augmentation.
- Fine-tuned VGG16 with added dense layers for 7-class classification.
- Used categorical cross-entropy loss and early stopping.
Result: Reached 72% accuracy, surpassing traditional ML baselines (e.g., SVMs ~45%).
Tags: Keras · VGG16 · Image Classification · Emotion Recognition · Transfer Learning · FER-2013

3. Fashion MNIST Classification with CNNs

Situation: Fashion MNIST is a standard benchmark for testing deep learning models on real-world classification tasks.
Task: Classify clothing images into 10 categories.
Action:
- Built multiple CNN architectures (2–4 conv layers, pooling, dropout).
- Compared models using accuracy and loss curves.
- Visualized feature maps for interpretability.
Result: Achieved 92% accuracy on test set with deeper CNN.
Tags: TensorFlow · CNN · Fashion-MNIST · Model Comparison · Deep Learning

4. Green Screening with OpenCV

Situation: Film, broadcasting, and AR applications rely on chroma key (green screen).
Task: Replace green backgrounds in images and videos with arbitrary scenes.
Action:
- Used OpenCV to detect and mask green pixel ranges.
- Replaced with background images/videos dynamically.
- Implemented for both static images and live video streams.
Result: Delivered real-time background replacement with smooth transitions.
Tags: OpenCV · Computer Vision · Chroma Keying · Real-Time Video Processing

5. Image Deblurring with VGG16 + DCGAN

Situation: Blurred images affect critical fields like surveillance and medical imaging.
Task: Restore sharpness in blurred images.
Action:
- Built a DCGAN generator-discriminator architecture.
- Used VGG16 perceptual loss to guide training.
- Trained on custom mixed-blur dataset (motion blur, Gaussian blur).
Result: Restored sharper images with SSIM score improvement of +18% over baseline interpolation.
Tags: GAN · DCGAN · VGG16 · Image Restoration · Perceptual Loss

6. Image Captioning with Flickr30k

Situation: Image captioning aids accessibility for the visually impaired and powers multimedia search.
Task: Generate natural language descriptions of images.
Action:
- Extracted features with VGG16 encoder.
- Trained LSTM decoder with teacher forcing on Flickr30k captions.
- Evaluated with BLEU scores.
Result: Generated fluent captions like “A boy playing with a dog in the grass” with BLEU-4 ≈ 0.41.
Tags: VGG16 · LSTM · Seq2Seq · Encoder-Decoder · Image Captioning · Flickr30k

📝 Natural Language Processing (NLP)

7. Tweets Sentiment Analysis with 3 Neural Networks

Situation: Businesses and political campaigns monitor sentiment on Twitter for decision-making.
Task: Classify tweets into positive, negative, or neutral sentiment.
Action:
- Built three deep neural network architectures (DNN, CNN, RNN).
- Preprocessed with regex, stopword removal, embeddings (GloVe).
- Compared architectures on accuracy/F1.
Result: Best-performing CNN achieved 88% accuracy on test data.
Tags: NLP · Sentiment Analysis · DNN · CNN · RNN · Embeddings

8. GenZ Tweets Data Pipeline for Sentiment Analysis

Situation: Raw social media data is noisy with slang, emojis, and hashtags.
Task: Design a reusable pipeline for tweet preprocessing.
Action:
- Implemented regex cleaning, tokenization, lemmatization.
- Normalized emojis, URLs, and @mentions.
- Built pipeline both in Jupyter and as a standalone Python script.
Result: Produced clean, structured text improving sentiment model accuracy by ~10%.
Tags: NLP · Data Pipeline · Regex · NLTK · SpaCy · Preprocessing

9. Next Word Prediction with Bi-Directional LSTM

Situation: Next-word prediction powers mobile keyboards and search engines.
Task: Build a language model to predict the next word.
Action:
- Preprocessed text corpus into n-grams.
- Trained a Bi-LSTM sequence model with embeddings.
- Evaluated using perplexity and prediction accuracy.
Result: Generated accurate predictions with perplexity reduced to ~35, suitable for autocomplete.
Tags: Bi-LSTM · Language Modeling · Sequence Prediction · Text Generation

10. Prompt-to-Synopsis Generator (Fine-Tuning)

Situation: Entertainment & content industries need tools to expand short prompts into story outlines.
Task: Fine-tune transformer to generate synopses from short prompts.
Action:
- Fine-tuned GPT-2 using HuggingFace Transformers.
- Applied causal LM loss, LR scheduling, and early stopping.
- Evaluated coherence & diversity of outputs.
Result: Produced multi-sentence coherent synopses with logical flow from prompts.
Tags: Transformers · GPT-2 · Fine-Tuning · HuggingFace · Text Generation

11. AI Long-Form Story Generator with Varied Context

Situation: Longer context improves story coherence but increases complexity.
Task: Build a model for generating long-form stories.
Action:
- Used transformer causal language models.
- Tested varying context sizes and prompt strategies.
Result: Generated coherent multi-paragraph stories; longer context improved narrative consistency.
Tags: Transformers · Causal LM · Text Generation · HuggingFace · Long-Context Modeling

🔮 Multi-Modal AI

12. AI Imagining Stories from Images

Situation: Bridging computer vision with NLP enables storytelling from images.
Task: Generate stories based on input images.
Action:
- Extracted image features via pretrained vision encoders.
- Used HuggingFace causal language models to generate narratives.
Result: Produced imaginative, contextually relevant stories from fantasy to real-world scenes.
Tags: Multi-Modal AI · Vision + Language · Transformers · HuggingFace · Image-to-Text

📜 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
Computer-Vision		Computer-Vision
Multi-modal		Multi-modal
NLP		NLP
Theory_Notebooks		Theory_Notebooks
Upcoming_Projects		Upcoming_Projects
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

🧠 AI Projects: NLP & Computer Vision

📊 Project Summary (Skills Matrix)

📂 Projects

🖼️ Computer Vision

1. Face Mask Detection with VGG16

2. Facial Emotion Recognition with VGG16

3. Fashion MNIST Classification with CNNs

4. Green Screening with OpenCV

5. Image Deblurring with VGG16 + DCGAN

6. Image Captioning with Flickr30k

📝 Natural Language Processing (NLP)

7. Tweets Sentiment Analysis with 3 Neural Networks

8. GenZ Tweets Data Pipeline for Sentiment Analysis

9. Next Word Prediction with Bi-Directional LSTM

10. Prompt-to-Synopsis Generator (Fine-Tuning)

11. AI Long-Form Story Generator with Varied Context

🔮 Multi-Modal AI

12. AI Imagining Stories from Images

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

sanskarGupta551/ai-projects-nlp-computer-vision

Folders and files

Latest commit

History

Repository files navigation

🧠 AI Projects: NLP & Computer Vision

📊 Project Summary (Skills Matrix)

📂 Projects

🖼️ Computer Vision

1. Face Mask Detection with VGG16

2. Facial Emotion Recognition with VGG16

3. Fashion MNIST Classification with CNNs

4. Green Screening with OpenCV

5. Image Deblurring with VGG16 + DCGAN

6. Image Captioning with Flickr30k

📝 Natural Language Processing (NLP)

7. Tweets Sentiment Analysis with 3 Neural Networks

8. GenZ Tweets Data Pipeline for Sentiment Analysis

9. Next Word Prediction with Bi-Directional LSTM

10. Prompt-to-Synopsis Generator (Fine-Tuning)

11. AI Long-Form Story Generator with Varied Context

🔮 Multi-Modal AI

12. AI Imagining Stories from Images

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages