This study was conducted as part of the Kodluyoruz statistics and data preprocessing working group, using the Titanic dataset suggested by our instructor. The goal of this study is to perform descriptive statistics, data visualization, missing value analysis, and outlier detection methods on the dataset.
-
Loading and Exploring the Dataset: The Titanic dataset was first loaded and examined to understand its structure and content.
-
Descriptive Statistics: Descriptive statistics such as mean, median, standard deviation, and distribution of data were calculated to summarize the dataset.
-
Data Visualization: Various data visualization techniques were used to gain insights into the relationships between different features (such as survival rates, passenger classes, gender, and age).
-
Missing Value Analysis: Methods for identifying and handling missing data were applied, including detecting which columns had missing values and deciding how to handle them.
-
Outlier Detection: Outlier detection methods were used to identify and handle any extreme values that could impact the analysis.
- Descriptive Statistics to summarize the dataset.
- Data Visualization with libraries like Matplotlib and Seaborn for better understanding.
- Missing Value Analysis to identify and handle missing data.
- Outlier Detection to find and manage any data points that were far outside the normal range.
This project helped in understanding the fundamentals of data preprocessing and visualization techniques before moving on to building machine learning models.