CSV Deduplicator Notebook

A simple but powerful Jupyter Notebook built to clean CSV files by identifying and removing duplicate rows. This is a common and essential first step in any data cleaning or analysis pipeline.

📝 Overview

This notebook provides a clear, step-by-step process to:

Load a CSV file into a Pandas DataFrame.
Analyze the data to find the total number of duplicate rows.
Remove all duplicate rows efficiently.
Save the clean, deduplicated data back to a new CSV file.

🚀 Technologies Used

Python
Pandas: The core library used for data loading, manipulation, and analysis.
Jupyter Notebook: For interactive code execution and clear documentation.

💡 How to Use

Add Your File: Place the CSV file you want to clean into the same folder as this notebook.
Open the Notebook: Launch the .ipynb file (e.g., in VS Code, Jupyter Lab, or Google Colab).
Update the Filename: In the first code cell, change the filename variable to match the name of your CSV file.
```
# Change 'your-file-name.csv' to the name of your file
filename = 'your-file-name.csv'
```
Run All Cells: Run all the cells in the notebook from top to bottom.
Get Your Clean File: The notebook will save a new file in the same folder, named your-original-filename_deduplicated.csv. This new file contains your clean, duplicate-free dataset!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
The_Ultimate_CSV_Deduplicator.ipynb		The_Ultimate_CSV_Deduplicator.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSV Deduplicator Notebook

📝 Overview

🚀 Technologies Used

💡 How to Use

About

Uh oh!

Releases

Packages

Languages

mickeywmoore/csv-deduplicator-notebook

Folders and files

Latest commit

History

Repository files navigation

CSV Deduplicator Notebook

📝 Overview

🚀 Technologies Used

💡 How to Use

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages