The purpose of this assignment was to create a a python script that utilizes pandas to generate summary statistics.
Jupyter Notebook
with:- Cells that perform descriptive statistics using Polars or Panda
- Tested by using nbval plugin for pytest
Python Script
performing the same descriptive statistics using Polars or Pandaslib.py
file that shares the common code between the script and notebookMakefile
with the following:- Run all tests (must test notebook and script and lib)
- Formats code with Python Black
- Lints code with Ruff
- Installs code via: pip install -r requirements.txt
test_script.py
to test scripttest_lib.py
to test library- Pinned
requirements.txt
GitHub Actions
performs all four Makefile commands with badges for each one in theREADME.md
The data set used in this project was pulled from github. The Titanic dataset is a well-known dataset in the field of data science and machine learning, often used for educational purposes. It contains information about the passengers aboard the RMS Titanic, which sank on its maiden voyage in April 1912 after hitting an iceberg. The dataset provides valuable insights into the factors that influenced survival rates during this tragic event.
This dataset includes a range of demographic and socio-economic information about the passengers, such as their age, gender, class, and ticket fare. Analysts and data scientists use this dataset to explore various questions. The data used can be found here: https://raw.githubusercontent.com/datasciencedojo/datasets/refs/heads/master/titanic.csv
calculate_correlation_matrix
-- creates correlation matrixsurvival_rates_by_group
-- calculates survival rates grouped by columnglot_survival_rates
-- plots survival ratescalculate_descriptive_statistics
-- calculates and return descriptive statistics
The code provides insights into how various factors (like gender, age, and passenger class) affected survival rates on the Titanic. By visualizing these relationships, it helps the reader understand the demographic influences on survival during the disaster.