Skip to content

codexaslam/Data-Driven-Computer-Architecture-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-driven Computing Architectures 2025 - Final Project

Project Title: Student Depression Data Pipeline and Prediction

This GitHub repository contains the final project submission for the Data-driven Computing Architectures course (2025). The project implements a data pipeline to ingest, process, and visualize a student depression dataset using Snowflake, followed by training a machine learning model to predict depression. The pipeline adheres to the Medallion Architecture (Bronze, Silver, Gold layers) and provides actionable insights into student mental health.


Author

  • Name: Md Aslam Hossain
  • Contribution: Sole contributor, responsible for designing, implementing, and documenting the pipeline and ML model. All work is tracked via a clear history of commits in this repository.

Project Overview

This project focuses on building a data pipeline to analyze student mental health data (student_depression_dataset.csv) through four stages:

  1. Ingestion: Loads raw CSV data into Snowflake’s bronze layer (BRONZE_STUDENT_DATA) and tracks lineage in DATA_LINEAGE using ingest.py.
  2. Processing: Cleans and aggregates data into silver (SILVER_STUDENT_DATA) and gold (GOLD_STUDENT_INSIGHTS) layers with process.py.
  3. Visualization: Generates visual insights (e.g., depression rates by gender, CGPA vs. pressure) saved in example/ using visualize.py.
  4. Modeling: Trains a Random Forest Classifier to predict depression, saved as model/depression_model.joblib with model.py.

The pipeline leverages Snowflake for scalable data storage and Python for processing and analysis, culminating in both visual outputs and a predictive model.


Repository Structure


Setup Instructions

  1. Clone the Repository:

    git clone https://github.com/aa-it-vasa/ddca2025-project-group-24.git
    cd ddca2025-project-group-24
  2. Install all the required package:

    pip install -r requirements.txt
  3. Command for ETL:

    # 1. Ingest raw data
    python code/ingest.py
    
    # 2. Process to Silver/Gold
    python code/process.py
    
    # 3. Generate visuals
    python code/visualize.py
    
    # 3. Train the Prediction Model
    python code/model.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages