Skip to content

mohamedgomaa15/machine_learning_projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 

Repository files navigation

Machine Learning Projects

πŸ“Œ Introduction

This repository is dedicated to machine learning projects with the primary goal of learning and practicing ML concepts.

The projects included here are meant to:

  • Explore different areas of machine learning.
  • Practice the full workflow: data preprocessing, exploratory analysis, model development, and evaluation.
  • Build both theoretical understanding and practical coding skills.

🎯 Purpose

The main purpose of this repository is to serve as a personal learning space while also providing others with examples and references for their own ML journey.

Each project is a step toward improving skills in data science and machine learning through hands-on implementation.


πŸ“‚ Project 1: Sales Prediction

Sales Prediction

πŸ“– Task Description
Given a product and its corresponding details, the task is to predict the amount of sales based on the available features.

πŸ“Š Dataset Columns

  • Item_Identifier: Unique identity number for a product
  • Item_Weight: Weight of the product
  • Item_Fat_Content: Fat content (Low Fat / Regular)
  • Item_Visibility: Percentage of store display allocated to the product
  • Item_Type: Category of the product
  • Item_MRP: Maximum Retail Price (list price)
  • Outlet_Identifier: Unique store ID
  • Outlet_Establishment_Year: Year the store was established
  • Outlet_Size: Store size in terms of ground area
  • Outlet_Location_Type: Type of city in which the store is located
  • Outlet_Type: Type of outlet (grocery store or supermarket)
  • Item_Outlet_Sales: Target variable – product sales in the store

πŸ€– Recommended Approaches & Models

  • Data Preprocessing: Handle missing values, encode categorical features, and scale numeric values.
  • Exploratory Analysis: Identify relationships between Item_MRP, Outlet_Type, and Item_Outlet_Sales.
  • Baseline Models: Linear Regression, Decision Tree Regressor
  • Advanced Models: Random Forest, XGBoost, LightGBM, CatBoost, Neural Networks (MLP)
  • Evaluation Metric: Root Mean Squared Error (RMSE)

πŸ“‚ Project 2: Disaster Classification

Disaster Classification

πŸ“– Task Description
Given an image as input, the task is to classify the image into one of four disaster categories:

  • CYCLONE
  • EARTHQUAKE
  • FLOOD
  • WILDFIRE

πŸ“Š Dataset Information

  • Training Samples: 400 per category
  • Validation Samples: 100 per category
  • Test Samples: 100 per category
  • Dataset Link: Google Drive Dataset

Note: The dataset does not include .csv/.tsv/.txt annotation files. Images are organized in category-named subfolders; this folder structure can be used for labeling.

πŸ€– Recommended Approaches & Models

  • Data Preparation: Image augmentation (rotation, flipping, scaling) and normalization.
  • Baseline Models: Custom CNN architectures.
  • Pretrained Models (Transfer Learning): ResNet50, VGG16/VGG19, EfficientNet, MobileNetV2.
  • Evaluation Metrics: Precision, Recall, and F1-score per category; Macro Precision/Recall/F1 overall.

πŸ“‚ Project 3: Handwriting Recognition

Handwriting Recognition

πŸ“– Objective
Build and evaluate a model that recognizes handwritten text using labeled character images.

πŸ“Š Dataset Information

βš™οΈ Workflow Overview

  1. EDA: Visualize character distribution and sample images with labels.
  2. Preprocessing: Normalize images, split into train/val/test, apply augmentation.
  3. Model Selection:
    • Baselines: Logistic Regression, SVM (on flattened features)
    • Deep Learning: CNNs (e.g., simple CNN stacks/LeNet)
    • Advanced: CNN + RNN hybrids (CRNN) for sequence modeling
  4. Training: Track accuracy/loss, use dropout, early stopping, regularization.
  5. Optimization: Tune hyperparameters; compare architectures.
  6. Evaluation: Character-wise F1 scores, confusion matrix, analyze misclassifications.

πŸ€– Recommended Approaches & Models

  • Baseline: Logistic Regression, SVM
  • Deep Learning: CNNs; CNN+RNN for sequences
  • Modern: Transformer-based OCR approaches
  • Metrics: Character-level F1, accuracy, confusion matrix

πŸ“‚ Project 4: Robert Frost's Poem Generation

Poem Generation

πŸ“– Description
Generate new poems inspired by Robert Frost’s style using next-word prediction.

πŸ“Š Dataset Information

βš™οΈ Model Architecture

  • Baseline: BiLSTM
    • Input: sequence length & vocab size
    • Hidden: BiLSTM units + dropout
    • Output: softmax for next-word prediction
  • Training: categorical cross-entropy, Adam, batch size & epochs configurable

πŸ€– Recommended Approaches & Models

  • Baseline: BiLSTM next-word model
  • Advanced (Optional): GPT/T5 fine-tuning; experiment with diffusion-style text models

πŸ“ˆ Evaluation

  • Perplexity (next-word prediction quality)
  • Accuracy (coherence proxy)
  • Generate example poems and qualitatively assess style/fluency

πŸ“‚ Project 5: Cuisine Prediction

Cuisine Prediction

πŸ“– Description
Predict the cuisine of a recipe from its list of ingredients (e.g., Indian, Mexican, Moroccan, Korean, Greek).

πŸ“Š Dataset Information

πŸ” Example (train.json)

{
  "id": 24717,
  "cuisine": "indian",
  "ingredients": [
    "turmeric",
    "vegetable stock",
    "tomatoes",
    "garam masala",
    "naan",
    "red lentils",
    "red chili peppers",
    "onions",
    "spinach",
    "sweet potatoes"
  ]
}

πŸ€– Recommended Approaches & Models

  • Baseline: ANN / simple feed-forward neural network with bag-of-words or TF-IDF features
  • Intermediate: Embedding layers with CNN or BiLSTM for ingredient sequences
  • Advanced (Optional): Transformer-based models

πŸ“ˆ Evaluation

  • Accuracy (For simplicity)

✈️ Flight Delay Prediction

Flight Delay

πŸ“– Description

The goal of this project is to predict whether a flight will be delayed based on historical and contextual flight data.
This is a binary classification problem (Delayed vs On-time).

πŸ“Š Dataset Information

  • Dataset Link: Google Drive Dataset
  • Dataset: Flight records with the following key fields:
    • Year, Month, Day β†’ flight date
    • DayOfWeek β†’ numeric day of the week
    • Airline β†’ airline carrier code
    • FlightNum β†’ flight number
    • Origin β†’ origin airport
    • Dest β†’ destination airport
    • DepTime β†’ actual departure time
    • ArrTime β†’ actual arrival time
    • DepDelay β†’ departure delay in minutes
    • ArrDelay β†’ arrival delay in minutes (target variable)

🎯 Prediction Task

  • Target Variable: ArrDelay
  • Convert into binary classification:
    • Delayed β†’ if ArrDelay > 15 minutes
    • On-time β†’ otherwise

πŸ“ˆ Evaluation

  • Metrics: Accuracy, Precision, Recall, F1-score
  • ROC-AUC can also be used for better evaluation of class imbalance.

πŸ€– Recommended Approaches & Models

  • Baseline: Logistic Regression / Random Forest
  • Intermediate: Gradient Boosted Trees (XGBoost, LightGBM, CatBoost)
  • Advanced:
    • Neural Networks with embedding layers for categorical features
    • Sequence models (RNNs/LSTMs) for temporal flight patterns

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published