Skip to content

nogibjj/Eric_Ortega_Rodriguez_Individual_Project_-1

Repository files navigation

My Titanic Project

IDS 706: Individual Project 1

image image

IDS 706: Individual Project 1

Continous Integration using Github Actions of Python Data Science Project

Eric Ortega Rodriguez

The purpose of this assignment was to create a a python script that utilizes pandas to generate summary statistics.

Format

Install

Lint

Test

YouTube Video

Click Here For Overview Video

Assignment Requirements

  • Jupyter Notebook with:
    • Cells that perform descriptive statistics using Polars or Panda
    • Tested by using nbval plugin for pytest
  • Python Script performing the same descriptive statistics using Polars or Pandas
  • lib.py file that shares the common code between the script and notebook
  • Makefile with the following:
    • Run all tests (must test notebook and script and lib)
    • Formats code with Python Black
    • Lints code with Ruff
    • Installs code via: pip install -r requirements.txt
  • test_script.py to test script
  • test_lib.py to test library
  • Pinned requirements.txt
  • GitHub Actions performs all four Makefile commands with badges for each one in the README.md

Data Set Used in this Project

The data set used in this project was pulled from github. The Titanic dataset is a well-known dataset in the field of data science and machine learning, often used for educational purposes. It contains information about the passengers aboard the RMS Titanic, which sank on its maiden voyage in April 1912 after hitting an iceberg. The dataset provides valuable insights into the factors that influenced survival rates during this tragic event.

This dataset includes a range of demographic and socio-economic information about the passengers, such as their age, gender, class, and ticket fare. Analysts and data scientists use this dataset to explore various questions. The data used can be found here: https://raw.githubusercontent.com/datasciencedojo/datasets/refs/heads/master/titanic.csv

Functions

  1. calculate_correlation_matrix -- creates correlation matrix
  2. survival_rates_by_group -- calculates survival rates grouped by column
  3. glot_survival_rates -- plots survival rates
  4. calculate_descriptive_statistics -- calculates and return descriptive statistics

Data Visualizations

The code provides insights into how various factors (like gender, age, and passenger class) affected survival rates on the Titanic. By visualizing these relationships, it helps the reader understand the demographic influences on survival during the disaster.

Descriptive Statistics

image

Correlation Matrix

image

Survival by Class

image

Survival by Sex

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published