Skip to content

Ironhack-Data-Madrid-Octubre-2022/w2-pandas-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

portada

W2 Project - Data cleaning & wrangling

The goal of this project is to combine everything you have learned about data wrangling, cleaning, and manipulation with Pandas so you can see how it all works together. For this project, you will start with this messy data set Shark Attack. You will need to download it, import it, use your data wrangling skills to clean it up, prepare it to be analyzed, and then export it as a clean CSV data file. Some graphs to better understand the data will surely be useful!!

TO DO's

  1. Explore the data and write down what you have found
    • you can use: df.describe(), df["column"], etc.
  2. Use at least 5 data cleaning techniques inside a file named clean.ipynb
    • null values, columns drop, duplicated data, string manipulation, apply fn, categorize, regex, etc.
  3. Show data that validates the conclusions based on your hypoteses in a file named analysis.ipynb

Suggested Ways to Get Started

  • Examine the data and try to understand what the fields mean before diving into data cleaning and manipulation methods.
  • Break the project down into different steps - use the topics covered in the lessons to form a check list, add anything else you can think of that may be wrong with your data set, and then work through the check list.
  • Use the tools in your tool kit - your knowledge of Python, data structures, Pandas, and data wrangling. Work through the lessons in class & ask questions when you need to! Think about adding relevant code to your project each night, instead of, you know... procrastinating.
  • Commit early, commit often, don’t be afraid of doing something incorrectly because you can always roll back to a previous version.
  • Consult documentation and resources provided to better understand the tools you are using and how to accomplish what you want.

How to deliver the project

  1. Create a new repo with the name data-cleaning-pandas on your github account.
    • Create a README.md file on repo root with project documentation. Make sure to include as much useful information as possible. Someone that finds the README.md should be able to fully get a gist of the project without browsing your files.
    • Include a .gitignore
    • At least 1 jupyter notebook is required
    • Including your functions in a src.py is very, very highly reccommended (maybe even mandatory, check with your instructors)
    • DO NOT UPLOAD SHARKs ATTACK DATASET TO GITHUB
  2. Open an Issue on this repo and paste your own repo's link.

Links & Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published