This project involved using MySQL to clean and prepare a Nashville housing dataset with over 56,000 rows for analysis. The primary focus was on resolving various data quality issues to enhance the dataset's usability.
Key tasks included:
-
Standardizing Date Format: Ensured consistency across the dataset.
-
Populating Null Property Addresses: Filled missing data in the PropertyAddress column.
-
Breaking Down Address Information: Separated City, State, and House Address into individual columns for both Property and Owner addresses.
-
Standardizing Categorical Values: Converted 'Y' and 'N' values to 'Yes' and 'No' in the "Sold as Vacant" field.
-
Removing Duplicates: Cleared duplicate entries to ensure data accuracy.
-
Deleting Unnecessary Columns: Removed unnecessary columns to streamline the dataset.
The cleaned and well-structured dataset is now better suited for accurate analysis, supporting informed decision-making.