Skip to content

Ai Projects with Explanation of Topic and Code Implementation in easy-to-understand language in Jupyter Notebooks.

License

sanskarGupta551/ai-projects-nlp-computer-vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  AI Projects: NLP & Computer Vision

This repository is a curated collection of AI projects spanning Computer Vision, Natural Language Processing (NLP), and Multi-Modal AI. Each project is implemented in a self-contained Jupyter Notebook (or script) with explanations, code, and results.

๐Ÿ”— Useful Links:


๐Ÿ“Š Project Summary (Skills Matrix)

Project Domain Key Skills / Techniques Tools & Frameworks
Face Mask Detection with VGG16 CV Transfer Learning, Binary Classification, Data Augmentation TensorFlow, Keras, VGG16
Facial Emotion Recognition with VGG16 CV Multi-class Classification, Emotion Recognition, Transfer Learning Keras, VGG16
Fashion MNIST Classification with CNNs CV CNN Architectures, Model Comparison, Feature Visualization TensorFlow, Keras
Green Screening with OpenCV CV Chroma Keying, Real-time Video Processing OpenCV, NumPy
Image Deblurring with VGG16 + DCGAN CV GANs, Perceptual Loss, Image Restoration DCGAN, VGG16, TensorFlow
Image Captioning with Flickr30k CV + NLP Encoder-Decoder, Seq2Seq, BLEU Evaluation VGG16, LSTM, Keras
Tweets Sentiment Analysis (3 Neural Nets) NLP Sentiment Analysis, DNN/CNN/RNN Comparison, Embeddings TensorFlow, Keras, GloVe
GenZ Tweets Data Pipeline NLP Text Preprocessing, Regex, Lemmatization, Emoji Normalization NLTK, SpaCy, Python
Next Word Prediction with Bi-LSTM NLP Language Modeling, Sequence Prediction, Perplexity TensorFlow, Keras, Bi-LSTM
Prompt-to-Synopsis Generator NLP Fine-Tuning Transformers, Creative Text Generation HuggingFace, GPT-2
AI Long-Form Story Generator NLP Long-Context Modeling, Story Generation HuggingFace, Transformers
AI Imagining Stories from Images Multi-Modal Image-to-Text, Vision+Language, Storytelling HuggingFace, Transformers

๐Ÿ“‚ Projects

๐Ÿ–ผ๏ธ Computer Vision

  • Situation: During COVID-19, monitoring mask compliance became critical in public spaces.

  • Task: Build a system to automatically detect masks from images of people.

  • Action:

    • Fine-tuned VGG16 (transfer learning) pretrained on ImageNet.
    • Applied data augmentation (rotation, flipping, zoom) for robustness.
    • Built a binary classifier on labeled mask/no-mask dataset.
  • Result: Achieved 97% accuracy on validation data, demonstrating production feasibility for surveillance and healthcare use cases.

  • Tags: TensorFlow ยท Keras ยท Transfer Learning ยท CNN ยท Image Classification ยท Model Deployment

    Image


  • Situation: Emotion recognition is important in mental health monitoring, human-computer interaction, and customer analytics.

  • Task: Develop a model to classify facial images into multiple emotions.

  • Action:

    • Preprocessed FER-2013 dataset with grayscale normalization & augmentation.
    • Fine-tuned VGG16 with added dense layers for 7-class classification.
    • Used categorical cross-entropy loss and early stopping.
  • Result: Reached 72% accuracy, surpassing traditional ML baselines (e.g., SVMs ~45%).

  • Tags: Keras ยท VGG16 ยท Image Classification ยท Emotion Recognition ยท Transfer Learning ยท FER-2013

    Image


  • Situation: Fashion MNIST is a standard benchmark for testing deep learning models on real-world classification tasks.

  • Task: Classify clothing images into 10 categories.

  • Action:

    • Built multiple CNN architectures (2โ€“4 conv layers, pooling, dropout).
    • Compared models using accuracy and loss curves.
    • Visualized feature maps for interpretability.
  • Result: Achieved 92% accuracy on test set with deeper CNN.

  • Tags: TensorFlow ยท CNN ยท Fashion-MNIST ยท Model Comparison ยท Deep Learning

    Image


  • Situation: Film, broadcasting, and AR applications rely on chroma key (green screen).

  • Task: Replace green backgrounds in images and videos with arbitrary scenes.

  • Action:

    • Used OpenCV to detect and mask green pixel ranges.
    • Replaced with background images/videos dynamically.
    • Implemented for both static images and live video streams.
  • Result: Delivered real-time background replacement with smooth transitions.

  • Tags: OpenCV ยท Computer Vision ยท Chroma Keying ยท Real-Time Video Processing

    Image


  • Situation: Blurred images affect critical fields like surveillance and medical imaging.

  • Task: Restore sharpness in blurred images.

  • Action:

    • Built a DCGAN generator-discriminator architecture.
    • Used VGG16 perceptual loss to guide training.
    • Trained on custom mixed-blur dataset (motion blur, Gaussian blur).
  • Result: Restored sharper images with SSIM score improvement of +18% over baseline interpolation.

  • Tags: GAN ยท DCGAN ยท VGG16 ยท Image Restoration ยท Perceptual Loss

    Image


  • Situation: Image captioning aids accessibility for the visually impaired and powers multimedia search.

  • Task: Generate natural language descriptions of images.

  • Action:

    • Extracted features with VGG16 encoder.
    • Trained LSTM decoder with teacher forcing on Flickr30k captions.
    • Evaluated with BLEU scores.
  • Result: Generated fluent captions like โ€œA boy playing with a dog in the grassโ€ with BLEU-4 โ‰ˆ 0.41.

  • Tags: VGG16 ยท LSTM ยท Seq2Seq ยท Encoder-Decoder ยท Image Captioning ยท Flickr30k

    Image


๐Ÿ“ Natural Language Processing (NLP)

  • Situation: Businesses and political campaigns monitor sentiment on Twitter for decision-making.

  • Task: Classify tweets into positive, negative, or neutral sentiment.

  • Action:

    • Built three deep neural network architectures (DNN, CNN, RNN).
    • Preprocessed with regex, stopword removal, embeddings (GloVe).
    • Compared architectures on accuracy/F1.
  • Result: Best-performing CNN achieved 88% accuracy on test data.

  • Tags: NLP ยท Sentiment Analysis ยท DNN ยท CNN ยท RNN ยท Embeddings

    Image


  • Situation: Raw social media data is noisy with slang, emojis, and hashtags.

  • Task: Design a reusable pipeline for tweet preprocessing.

  • Action:

    • Implemented regex cleaning, tokenization, lemmatization.
    • Normalized emojis, URLs, and @mentions.
    • Built pipeline both in Jupyter and as a standalone Python script.
  • Result: Produced clean, structured text improving sentiment model accuracy by ~10%.

  • Tags: NLP ยท Data Pipeline ยท Regex ยท NLTK ยท SpaCy ยท Preprocessing

    Image


  • Situation: Next-word prediction powers mobile keyboards and search engines.

  • Task: Build a language model to predict the next word.

  • Action:

    • Preprocessed text corpus into n-grams.
    • Trained a Bi-LSTM sequence model with embeddings.
    • Evaluated using perplexity and prediction accuracy.
  • Result: Generated accurate predictions with perplexity reduced to ~35, suitable for autocomplete.

  • Tags: Bi-LSTM ยท Language Modeling ยท Sequence Prediction ยท Text Generation

    Image


  • Situation: Entertainment & content industries need tools to expand short prompts into story outlines.

  • Task: Fine-tune transformer to generate synopses from short prompts.

  • Action:

    • Fine-tuned GPT-2 using HuggingFace Transformers.
    • Applied causal LM loss, LR scheduling, and early stopping.
    • Evaluated coherence & diversity of outputs.
  • Result: Produced multi-sentence coherent synopses with logical flow from prompts.

  • Tags: Transformers ยท GPT-2 ยท Fine-Tuning ยท HuggingFace ยท Text Generation

    Image


  • Situation: Longer context improves story coherence but increases complexity.

  • Task: Build a model for generating long-form stories.

  • Action:

    • Used transformer causal language models.
    • Tested varying context sizes and prompt strategies.
  • Result: Generated coherent multi-paragraph stories; longer context improved narrative consistency.

  • Tags: Transformers ยท Causal LM ยท Text Generation ยท HuggingFace ยท Long-Context Modeling

Image


๐Ÿ”ฎ Multi-Modal AI

  • Situation: Bridging computer vision with NLP enables storytelling from images.

  • Task: Generate stories based on input images.

  • Action:

    • Extracted image features via pretrained vision encoders.
    • Used HuggingFace causal language models to generate narratives.
  • Result: Produced imaginative, contextually relevant stories from fantasy to real-world scenes.

  • Tags: Multi-Modal AI ยท Vision + Language ยท Transformers ยท HuggingFace ยท Image-to-Text

Image


๐Ÿ“œ License

This project is licensed under the MIT License.

About

Ai Projects with Explanation of Topic and Code Implementation in easy-to-understand language in Jupyter Notebooks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published