How Couples Meet and Stay Together Regression

Isaiah Jenkins

Project Overview

This project analyzes the "How Couples Meet and Stay Together" dataset (2017–2022) to predict relationship duration using linear regression models. The study leverages a nationally representative survey of 4,002 American adults, with 3,009 reporting a spouse or romantic partner. The analysis focuses on key features such as age, income, employment status, past partner history, and relationship quality to identify factors influencing long-lasting relationships. Python libraries including Pandas, NumPy, Scikit-learn, Matplotlib, and Seaborn were used for data processing, modeling, and visualization.

Dataset

The dataset, sourced from the "How Couples Meet and Stay Together" study, contains 726 features capturing demographic and relationship data across multiple waves (2017–2022). This project focuses on Wave 3 (2022) data, narrowed to 10 relevant features, including:

Key Features: w3_ppage (age), w3_ppincimp (income), w3_ppwork (employment status), w3_past_partners_gender_1/2/3 (past partner history), w3_relatives (number of relatives), w3_weekly_sex_frequency, w3_rel_qual (relationship quality), and w3_relationship_duration_yrs (target variable).
Preprocessing: Handled missing values, encoded categorical variables, and filtered to 1,026 records with complete data for analysis.

Analysis

The analysis included:

Data Exploration: Examined dataset structure, identified missing values, and computed descriptive statistics.
Feature Engineering: Encoded categorical variables (e.g., income, employment, relationship quality) using one-hot encoding, resulting in 40 features.
Modeling:
- Baseline Linear Regression: Achieved train R² of 0.635 and test R² of 0.572.
- Polynomial Regression: Degree 2 and 4 models yielded negative R² scores (-3.21e6 and -3113.74, respectively), indicating underfitting.
- Lasso Regression: Applied regularization, achieving a test R² of 0.583, with key features like age and income showing influence.
Visualizations: Used box plots and other visualizations to inspect feature distributions and relationships.

Key Findings

Model Performance: The baseline linear regression model performed best (test R² = 0.572), but polynomial and Lasso models struggled due to inconsistent relationship duration data (e.g., excellent relationship quality reported for both short and long durations).
Feature Insights: Age and income were significant predictors, but inconsistencies in the target variable limited model accuracy.
Challenges: The dataset's complexity and inconsistencies in outcome variables hindered robust predictions.

Installation

To run this project, install the required dependencies:

pip install pandas numpy matplotlib seaborn scikit-learn

Download the HCMST_2017_to_2022.csv dataset and place it in the data/ directory.

Usage

Clone the repository:

git clone https://github.com/your-username/couples-regression.git
cd couples-regression

Set up the dataset:
- Place HCMST_2017_to_2022.csv in the data/ directory.

Run the Jupyter Notebook:

jupyter notebook Stay_Together_Regression.ipynb

Follow the notebook to explore data, train models, and review results.

Next Steps

Expand Dataset: Incorporate data from Waves 1 and 2 to increase sample size and feature diversity.
Refine Features: Select more consistent outcome variables and explore additional features (e.g., education, shared interests).
Model Improvements: Revisit polynomial regression with tuned hyperparameters and explore non-linear models (e.g., decision trees, random forests).
Alternative Datasets: Consider datasets with more consistent relationship duration metrics for improved predictive accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
Stay_Together_Regression.ipynb		Stay_Together_Regression.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

How Couples Meet and Stay Together Regression

Project Overview

Dataset

Analysis

Key Findings

Installation

Usage

Next Steps

About

Uh oh!

Releases

Packages

Languages

Jenkins1128/StayTogetherRegression

Folders and files

Latest commit

History

Repository files navigation

How Couples Meet and Stay Together Regression

Project Overview

Dataset

Analysis

Key Findings

Installation

Usage

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages