Data-driven Computing Architectures 2025 - Final Project

Project Title: Student Depression Data Pipeline and Prediction

This GitHub repository contains the final project submission for the Data-driven Computing Architectures course (2025). The project implements a data pipeline to ingest, process, and visualize a student depression dataset using Snowflake, followed by training a machine learning model to predict depression. The pipeline adheres to the Medallion Architecture (Bronze, Silver, Gold layers) and provides actionable insights into student mental health.

Author

Name: Md Aslam Hossain
Contribution: Sole contributor, responsible for designing, implementing, and documenting the pipeline and ML model. All work is tracked via a clear history of commits in this repository.

Project Overview

This project focuses on building a data pipeline to analyze student mental health data (student_depression_dataset.csv) through four stages:

Ingestion: Loads raw CSV data into Snowflake’s bronze layer (BRONZE_STUDENT_DATA) and tracks lineage in DATA_LINEAGE using ingest.py.
Processing: Cleans and aggregates data into silver (SILVER_STUDENT_DATA) and gold (GOLD_STUDENT_INSIGHTS) layers with process.py.
Visualization: Generates visual insights (e.g., depression rates by gender, CGPA vs. pressure) saved in example/ using visualize.py.
Modeling: Trains a Random Forest Classifier to predict depression, saved as model/depression_model.joblib with model.py.

The pipeline leverages Snowflake for scalable data storage and Python for processing and analysis, culminating in both visual outputs and a predictive model.

Repository Structure

code/: Core pipeline scripts and ML model training. See code/README.md for details.
data/: Sample input data (student_depression_dataset.csv). See data/README.md.
docs/: Additional scripts or notebooks (placeholder). See docs/README.md.
example/: Output visualizations and pipeline run examples. See example/README.md.
model/: Trained ML model file (depression_model.joblib) generated by model.py.

Setup Instructions

Clone the Repository:

git clone https://github.com/aa-it-vasa/ddca2025-project-group-24.git
cd ddca2025-project-group-24

Install all the required package:
```
pip install -r requirements.txt
```

Command for ETL:

# 1. Ingest raw data
python code/ingest.py

# 2. Process to Silver/Gold
python code/process.py

# 3. Generate visuals
python code/visualize.py

# 3. Train the Prediction Model
python code/model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data-driven Computing Architectures 2025 - Final Project

Project Title: Student Depression Data Pipeline and Prediction

Author

Project Overview

Repository Structure

Setup Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
code		code
data		data
docs		docs
example		example
model		model
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
pipeline.log		pipeline.log
requirements.txt		requirements.txt

codexaslam/Data-Driven-Computer-Architecture-project

Folders and files

Latest commit

History

Repository files navigation

Data-driven Computing Architectures 2025 - Final Project

Project Title: Student Depression Data Pipeline and Prediction

Author

Project Overview

Repository Structure

Setup Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages