In this task, I worked with the Titanic dataset to practice data cleaning and preprocessing using Pandas & NumPy. The main goal was to handle missing values and transform categorical data into numeric form, making the dataset ready for further analysis or modeling. This exercise is an essential step in preparing real-world datasets, ensuring they are clean, structured, and machine-learning friendly.
- Missing Value Handling โ Filled missing values in Age and Fare with their median values, and dropped the Cabin column due to excessive nulls.
- Categorical Encoding โ Converted categorical columns (Sex and Embarked) into numeric form using One-Hot Encoding.
- Final Clean Dataset โ Produced a structured DataFrame with no missing values and all categorical features transformed into numeric columns.
- โ Dataset Info โ Displays rows, columns, data types, and missing values.
- ๐ Encoded Features โ New numeric columns created: Sex_male, Embarked_Q, Embarked_S.
- ๐ Cleaned Data โ Ready-to-use dataset free from nulls and categorical text.
- ๐งโ๐ป Quick Preview โ Printed the first 5 rows to confirm transformations.
- Python (Jupyter Notebook / Script) โ for data preprocessing
- Pandas โ handling missing data & encoding categorical variables
- NumPy โ numerical operations during preprocessing
- ๐ Strengthened skills in real-world data cleaning and preprocessing.
- ๐ก Learned strategies for handling missing values (median filling, dropping).
- โก Practiced categorical variable encoding for machine learning readiness.
- ๐ Built a reusable preprocessing script for structured datasets.
.png)
.png)