Skip to content

A Jupyter Notebook that reads a CSV, removes duplicate entries using Pandas, and saves the clean data. (Skills: Python, Pandas, Data Cleaning)

Notifications You must be signed in to change notification settings

mickeywmoore/csv-deduplicator-notebook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

CSV Deduplicator Notebook

Python Pandas Jupyter

A simple but powerful Jupyter Notebook built to clean CSV files by identifying and removing duplicate rows. This is a common and essential first step in any data cleaning or analysis pipeline.

πŸ“ Overview

This notebook provides a clear, step-by-step process to:

  1. Load a CSV file into a Pandas DataFrame.
  2. Analyze the data to find the total number of duplicate rows.
  3. Remove all duplicate rows efficiently.
  4. Save the clean, deduplicated data back to a new CSV file.

πŸš€ Technologies Used

  • Python
  • Pandas: The core library used for data loading, manipulation, and analysis.
  • Jupyter Notebook: For interactive code execution and clear documentation.

πŸ’‘ How to Use

  1. Add Your File: Place the CSV file you want to clean into the same folder as this notebook.
  2. Open the Notebook: Launch the .ipynb file (e.g., in VS Code, Jupyter Lab, or Google Colab).
  3. Update the Filename: In the first code cell, change the filename variable to match the name of your CSV file.
    # Change 'your-file-name.csv' to the name of your file
    filename = 'your-file-name.csv'
  4. Run All Cells: Run all the cells in the notebook from top to bottom.
  5. Get Your Clean File: The notebook will save a new file in the same folder, named your-original-filename_deduplicated.csv. This new file contains your clean, duplicate-free dataset!

About

A Jupyter Notebook that reads a CSV, removes duplicate entries using Pandas, and saves the clean data. (Skills: Python, Pandas, Data Cleaning)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published