This repository is a curated collection of AI projects spanning Computer Vision, Natural Language Processing (NLP), and Multi-Modal AI. Each project is implemented in a self-contained Jupyter Notebook (or script) with explanations, code, and results.
๐ Useful Links:
- ๐ GitHub Repo
- ๐ค Hugging Face Transformers
- ๐ผ๏ธ Keras Applications
- ๐ Fashion-MNIST Dataset
- ๐ธ Flickr30k Dataset
| Project | Domain | Key Skills / Techniques | Tools & Frameworks |
|---|---|---|---|
| Face Mask Detection with VGG16 | CV | Transfer Learning, Binary Classification, Data Augmentation | TensorFlow, Keras, VGG16 |
| Facial Emotion Recognition with VGG16 | CV | Multi-class Classification, Emotion Recognition, Transfer Learning | Keras, VGG16 |
| Fashion MNIST Classification with CNNs | CV | CNN Architectures, Model Comparison, Feature Visualization | TensorFlow, Keras |
| Green Screening with OpenCV | CV | Chroma Keying, Real-time Video Processing | OpenCV, NumPy |
| Image Deblurring with VGG16 + DCGAN | CV | GANs, Perceptual Loss, Image Restoration | DCGAN, VGG16, TensorFlow |
| Image Captioning with Flickr30k | CV + NLP | Encoder-Decoder, Seq2Seq, BLEU Evaluation | VGG16, LSTM, Keras |
| Tweets Sentiment Analysis (3 Neural Nets) | NLP | Sentiment Analysis, DNN/CNN/RNN Comparison, Embeddings | TensorFlow, Keras, GloVe |
| GenZ Tweets Data Pipeline | NLP | Text Preprocessing, Regex, Lemmatization, Emoji Normalization | NLTK, SpaCy, Python |
| Next Word Prediction with Bi-LSTM | NLP | Language Modeling, Sequence Prediction, Perplexity | TensorFlow, Keras, Bi-LSTM |
| Prompt-to-Synopsis Generator | NLP | Fine-Tuning Transformers, Creative Text Generation | HuggingFace, GPT-2 |
| AI Long-Form Story Generator | NLP | Long-Context Modeling, Story Generation | HuggingFace, Transformers |
| AI Imagining Stories from Images | Multi-Modal | Image-to-Text, Vision+Language, Storytelling | HuggingFace, Transformers |
-
Situation: During COVID-19, monitoring mask compliance became critical in public spaces.
-
Task: Build a system to automatically detect masks from images of people.
-
Action:
- Fine-tuned VGG16 (transfer learning) pretrained on ImageNet.
- Applied data augmentation (rotation, flipping, zoom) for robustness.
- Built a binary classifier on labeled mask/no-mask dataset.
-
Result: Achieved 97% accuracy on validation data, demonstrating production feasibility for surveillance and healthcare use cases.
-
Tags:
TensorFlowยทKerasยทTransfer LearningยทCNNยทImage ClassificationยทModel Deployment
-
Situation: Emotion recognition is important in mental health monitoring, human-computer interaction, and customer analytics.
-
Task: Develop a model to classify facial images into multiple emotions.
-
Action:
- Preprocessed FER-2013 dataset with grayscale normalization & augmentation.
- Fine-tuned VGG16 with added dense layers for 7-class classification.
- Used categorical cross-entropy loss and early stopping.
-
Result: Reached 72% accuracy, surpassing traditional ML baselines (e.g., SVMs ~45%).
-
Tags:
KerasยทVGG16ยทImage ClassificationยทEmotion RecognitionยทTransfer LearningยทFER-2013
-
Situation: Fashion MNIST is a standard benchmark for testing deep learning models on real-world classification tasks.
-
Task: Classify clothing images into 10 categories.
-
Action:
- Built multiple CNN architectures (2โ4 conv layers, pooling, dropout).
- Compared models using accuracy and loss curves.
- Visualized feature maps for interpretability.
-
Result: Achieved 92% accuracy on test set with deeper CNN.
-
Tags:
TensorFlowยทCNNยทFashion-MNISTยทModel ComparisonยทDeep Learning
-
Situation: Film, broadcasting, and AR applications rely on chroma key (green screen).
-
Task: Replace green backgrounds in images and videos with arbitrary scenes.
-
Action:
- Used OpenCV to detect and mask green pixel ranges.
- Replaced with background images/videos dynamically.
- Implemented for both static images and live video streams.
-
Result: Delivered real-time background replacement with smooth transitions.
-
Tags:
OpenCVยทComputer VisionยทChroma KeyingยทReal-Time Video Processing
-
Situation: Blurred images affect critical fields like surveillance and medical imaging.
-
Task: Restore sharpness in blurred images.
-
Action:
- Built a DCGAN generator-discriminator architecture.
- Used VGG16 perceptual loss to guide training.
- Trained on custom mixed-blur dataset (motion blur, Gaussian blur).
-
Result: Restored sharper images with SSIM score improvement of +18% over baseline interpolation.
-
Tags:
GANยทDCGANยทVGG16ยทImage RestorationยทPerceptual Loss
-
Situation: Image captioning aids accessibility for the visually impaired and powers multimedia search.
-
Task: Generate natural language descriptions of images.
-
Action:
- Extracted features with VGG16 encoder.
- Trained LSTM decoder with teacher forcing on Flickr30k captions.
- Evaluated with BLEU scores.
-
Result: Generated fluent captions like โA boy playing with a dog in the grassโ with BLEU-4 โ 0.41.
-
Tags:
VGG16ยทLSTMยทSeq2SeqยทEncoder-DecoderยทImage CaptioningยทFlickr30k
-
Situation: Businesses and political campaigns monitor sentiment on Twitter for decision-making.
-
Task: Classify tweets into positive, negative, or neutral sentiment.
-
Action:
- Built three deep neural network architectures (DNN, CNN, RNN).
- Preprocessed with regex, stopword removal, embeddings (GloVe).
- Compared architectures on accuracy/F1.
-
Result: Best-performing CNN achieved 88% accuracy on test data.
-
Tags:
NLPยทSentiment AnalysisยทDNNยทCNNยทRNNยทEmbeddings
-
Situation: Raw social media data is noisy with slang, emojis, and hashtags.
-
Task: Design a reusable pipeline for tweet preprocessing.
-
Action:
- Implemented regex cleaning, tokenization, lemmatization.
- Normalized emojis, URLs, and @mentions.
- Built pipeline both in Jupyter and as a standalone Python script.
-
Result: Produced clean, structured text improving sentiment model accuracy by ~10%.
-
Tags:
NLPยทData PipelineยทRegexยทNLTKยทSpaCyยทPreprocessing
-
Situation: Next-word prediction powers mobile keyboards and search engines.
-
Task: Build a language model to predict the next word.
-
Action:
- Preprocessed text corpus into n-grams.
- Trained a Bi-LSTM sequence model with embeddings.
- Evaluated using perplexity and prediction accuracy.
-
Result: Generated accurate predictions with perplexity reduced to ~35, suitable for autocomplete.
-
Tags:
Bi-LSTMยทLanguage ModelingยทSequence PredictionยทText Generation
-
Situation: Entertainment & content industries need tools to expand short prompts into story outlines.
-
Task: Fine-tune transformer to generate synopses from short prompts.
-
Action:
- Fine-tuned GPT-2 using HuggingFace Transformers.
- Applied causal LM loss, LR scheduling, and early stopping.
- Evaluated coherence & diversity of outputs.
-
Result: Produced multi-sentence coherent synopses with logical flow from prompts.
-
Tags:
TransformersยทGPT-2ยทFine-TuningยทHuggingFaceยทText Generation
-
Situation: Longer context improves story coherence but increases complexity.
-
Task: Build a model for generating long-form stories.
-
Action:
- Used transformer causal language models.
- Tested varying context sizes and prompt strategies.
-
Result: Generated coherent multi-paragraph stories; longer context improved narrative consistency.
-
Tags:
TransformersยทCausal LMยทText GenerationยทHuggingFaceยทLong-Context Modeling
-
Situation: Bridging computer vision with NLP enables storytelling from images.
-
Task: Generate stories based on input images.
-
Action:
- Extracted image features via pretrained vision encoders.
- Used HuggingFace causal language models to generate narratives.
-
Result: Produced imaginative, contextually relevant stories from fantasy to real-world scenes.
-
Tags:
Multi-Modal AIยทVision + LanguageยทTransformersยทHuggingFaceยทImage-to-Text
This project is licensed under the MIT License.










