Machine Learning Projects

📌 Introduction

This repository is dedicated to machine learning projects with the primary goal of learning and practicing ML concepts.

The projects included here are meant to:

Explore different areas of machine learning.
Practice the full workflow: data preprocessing, exploratory analysis, model development, and evaluation.
Build both theoretical understanding and practical coding skills.

🎯 Purpose

The main purpose of this repository is to serve as a personal learning space while also providing others with examples and references for their own ML journey.

Each project is a step toward improving skills in data science and machine learning through hands-on implementation.

📂 Project 1: Sales Prediction

📖 Task Description
Given a product and its corresponding details, the task is to predict the amount of sales based on the available features.

Dataset Size: ~8k rows
Training Dataset Link: Google Sheets Dataset

📊 Dataset Columns

Item_Identifier: Unique identity number for a product
Item_Weight: Weight of the product
Item_Fat_Content: Fat content (Low Fat / Regular)
Item_Visibility: Percentage of store display allocated to the product
Item_Type: Category of the product
Item_MRP: Maximum Retail Price (list price)
Outlet_Identifier: Unique store ID
Outlet_Establishment_Year: Year the store was established
Outlet_Size: Store size in terms of ground area
Outlet_Location_Type: Type of city in which the store is located
Outlet_Type: Type of outlet (grocery store or supermarket)
Item_Outlet_Sales: Target variable – product sales in the store

🤖 Recommended Approaches & Models

Data Preprocessing: Handle missing values, encode categorical features, and scale numeric values.
Exploratory Analysis: Identify relationships between Item_MRP, Outlet_Type, and Item_Outlet_Sales.
Baseline Models: Linear Regression, Decision Tree Regressor
Advanced Models: Random Forest, XGBoost, LightGBM, CatBoost, Neural Networks (MLP)
Evaluation Metric: Root Mean Squared Error (RMSE)

📂 Project 2: Disaster Classification

📖 Task Description
Given an image as input, the task is to classify the image into one of four disaster categories:

CYCLONE
EARTHQUAKE
FLOOD
WILDFIRE

📊 Dataset Information

Training Samples: 400 per category
Validation Samples: 100 per category
Test Samples: 100 per category
Dataset Link: Google Drive Dataset

Note: The dataset does not include .csv/.tsv/.txt annotation files. Images are organized in category-named subfolders; this folder structure can be used for labeling.

🤖 Recommended Approaches & Models

Data Preparation: Image augmentation (rotation, flipping, scaling) and normalization.
Baseline Models: Custom CNN architectures.
Pretrained Models (Transfer Learning): ResNet50, VGG16/VGG19, EfficientNet, MobileNetV2.
Evaluation Metrics: Precision, Recall, and F1-score per category; Macro Precision/Recall/F1 overall.

📂 Project 3: Handwriting Recognition

📖 Objective
Build and evaluate a model that recognizes handwritten text using labeled character images.

📊 Dataset Information

Dataset: Kaggle – Handwriting Recognition
Structure: train.csv, validation.csv, test.csv + image folders
CSV columns:
- filename: image filepath
- identity: label (text/character)

⚙️ Workflow Overview

EDA: Visualize character distribution and sample images with labels.
Preprocessing: Normalize images, split into train/val/test, apply augmentation.
Model Selection:
- Baselines: Logistic Regression, SVM (on flattened features)
- Deep Learning: CNNs (e.g., simple CNN stacks/LeNet)
- Advanced: CNN + RNN hybrids (CRNN) for sequence modeling
Training: Track accuracy/loss, use dropout, early stopping, regularization.
Optimization: Tune hyperparameters; compare architectures.
Evaluation: Character-wise F1 scores, confusion matrix, analyze misclassifications.

🤖 Recommended Approaches & Models

Baseline: Logistic Regression, SVM
Deep Learning: CNNs; CNN+RNN for sequences
Modern: Transformer-based OCR approaches
Metrics: Character-level F1, accuracy, confusion matrix

📂 Project 4: Robert Frost's Poem Generation

📖 Description
Generate new poems inspired by Robert Frost’s style using next-word prediction.

📊 Dataset Information

Source: Project Gutenberg – Robert Frost’s Poems
Preprocessing: tokenize words, build vocabulary, convert to sequences
Split: 80% training / 20% testing

⚙️ Model Architecture

Baseline: BiLSTM
- Input: sequence length & vocab size
- Hidden: BiLSTM units + dropout
- Output: softmax for next-word prediction
Training: categorical cross-entropy, Adam, batch size & epochs configurable

🤖 Recommended Approaches & Models

Baseline: BiLSTM next-word model
Advanced (Optional): GPT/T5 fine-tuning; experiment with diffusion-style text models

📈 Evaluation

Perplexity (next-word prediction quality)
Accuracy (coherence proxy)
Generate example poems and qualitatively assess style/fluency

📂 Project 5: Cuisine Prediction

📖 Description
Predict the cuisine of a recipe from its list of ingredients (e.g., Indian, Mexican, Moroccan, Korean, Greek).

📊 Dataset Information

Dataset: Google Drive – Recipe Dataset
Format: JSON
Fields:
- id: unique recipe identifier
- cuisine: target label (train only)
- ingredients: list of ingredients

🔍 Example (train.json)

{
  "id": 24717,
  "cuisine": "indian",
  "ingredients": [
    "turmeric",
    "vegetable stock",
    "tomatoes",
    "garam masala",
    "naan",
    "red lentils",
    "red chili peppers",
    "onions",
    "spinach",
    "sweet potatoes"
  ]
}

🤖 Recommended Approaches & Models

Baseline: ANN / simple feed-forward neural network with bag-of-words or TF-IDF features
Intermediate: Embedding layers with CNN or BiLSTM for ingredient sequences
Advanced (Optional): Transformer-based models

📈 Evaluation

Accuracy (For simplicity)

✈️ Flight Delay Prediction

📖 Description

The goal of this project is to predict whether a flight will be delayed based on historical and contextual flight data.
This is a binary classification problem (Delayed vs On-time).

📊 Dataset Information

Dataset Link: Google Drive Dataset
Dataset: Flight records with the following key fields:
- Year, Month, Day → flight date
- DayOfWeek → numeric day of the week
- Airline → airline carrier code
- FlightNum → flight number
- Origin → origin airport
- Dest → destination airport
- DepTime → actual departure time
- ArrTime → actual arrival time
- DepDelay → departure delay in minutes
- ArrDelay → arrival delay in minutes (target variable)

🎯 Prediction Task

Target Variable: ArrDelay
Convert into binary classification:
- Delayed → if ArrDelay > 15 minutes
- On-time → otherwise

📈 Evaluation

Metrics: Accuracy, Precision, Recall, F1-score
ROC-AUC can also be used for better evaluation of class imbalance.

🤖 Recommended Approaches & Models

Baseline: Logistic Regression / Random Forest
Intermediate: Gradient Boosted Trees (XGBoost, LightGBM, CatBoost)
Advanced:
- Neural Networks with embedding layers for categorical features
- Sequence models (RNNs/LSTMs) for temporal flight patterns

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Projects

📌 Introduction

🎯 Purpose

📂 Project 1: Sales Prediction

📂 Project 2: Disaster Classification

📂 Project 3: Handwriting Recognition

📂 Project 4: Robert Frost's Poem Generation

📂 Project 5: Cuisine Prediction

✈️ Flight Delay Prediction

📖 Description

📊 Dataset Information

🎯 Prediction Task

📈 Evaluation

🤖 Recommended Approaches & Models

About

Uh oh!

Releases

Packages

mohamedgomaa15/machine_learning_projects

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Projects

📌 Introduction

🎯 Purpose

📂 Project 1: Sales Prediction

📂 Project 2: Disaster Classification

📂 Project 3: Handwriting Recognition

📂 Project 4: Robert Frost's Poem Generation

📂 Project 5: Cuisine Prediction

✈️ Flight Delay Prediction

📖 Description

📊 Dataset Information

🎯 Prediction Task

📈 Evaluation

🤖 Recommended Approaches & Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages