Skip to content

A comprehensive data science project covering data collection, cleaning, exploratory analysis (EDA), SQL queries, interactive visualizations (Folium/Plotly Dash), and predictive modeling. Developed as the final capstone for the IBM/Coursera Applied Data Science specialization.

Notifications You must be signed in to change notification settings

jeffthedeveloper/Applied-Data-Science-Capstone-End-to-End-Analysis-with-Python-SQL-and-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Key Steps:
βœ” Data Wrangling & Cleaning
βœ” Exploratory Data Analysis (EDA) with Python & SQL
βœ” Interactive Maps (Folium) & Dashboards (Plotly Dash)
βœ” Machine Learning (Classification/Regression)
βœ” Presentation & Insights


πŸ“„ README.md Template

# Applied Data Science Capstone Project  
**Course:** IBM Data Science Professional Certificate (Coursera)  
**Author:** Jefferson Firmino Mendes
**GitHub:** www.github.com/jeffthedeveloper  

## 🎯 Project Overview  
This project demonstrates an end-to-end data science workflow, from **data collection** to **predictive modeling**, as part of the IBM/Coursera Applied Data Science Capstone. It includes:  
- **Data wrangling** (cleaning, APIs, web scraping)  
- **Exploratory Data Analysis (EDA)** with Python & SQL  
- **Interactive visualizations** (Folium maps, Plotly Dash)  
- **Machine learning** (classification/regression) for predictions  

πŸ“‚ Repository Structure

β”œβ”€β”€ data/                    # Raw & processed datasets  
β”œβ”€β”€ notebooks/               # Jupyter notebooks (EDA, ML, etc.)  
β”‚   β”œβ”€β”€ 1_Data_Collection.ipynb  
β”‚   β”œβ”€β”€ 2_Data_Wrangling.ipynb  
β”‚   β”œβ”€β”€ 3_EDA_SQL_Analysis.ipynb  
β”‚   β”œβ”€β”€ 4_Predictive_Modeling.ipynb  
β”œβ”€β”€ scripts/                 # Python helper scripts  
β”œβ”€β”€ docs/                    # Reports, presentations (PDF)  
β”œβ”€β”€ app/                     # Plotly Dash/Flask app (if applicable)  
└── README.md  

πŸ› οΈ Tools & Technologies

  • Python (Pandas, NumPy, Matplotlib, Seaborn)
  • SQL (SQLite, PostgreSQL, or IBM Db2)
  • Interactive Maps: Folium
  • Dashboarding: Plotly Dash
  • Machine Learning: Scikit-learn, XGBoost
  • Version Control: Git/GitHub

πŸ” Key Findings

  1. Exploratory Analysis: [Brief insight, e.g., "70% of SpaceX launches reuse the booster"]
  2. Predictive Model: [e.g., "Random Forest achieved 85% accuracy in classifying accident severity"]
  3. Interactive Tools: [e.g., "Folium maps revealed regional trends in accidents"]

πŸš€ How to Run

  1. Clone the repo:
    git clone [https://github.com/jeffthedeveloper/Applied-Data-Science-Capstone-End-to-End-Analysis-with-Python-SQL-and-Machine-Learning/blob/main/README.md]
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run Jupyter notebooks:
    jupyter lab

πŸ“œ License

This project is part of an educational coursework (MIT License).

πŸ“Š Presentation

click here ➑ [https://drive.google.com/file/d/1lgArDDKVNuzi1ucUSftZAgvOwFCHdOkG/view?usp=sharing]


### **Customization Tips:**  

- For **SpaceX projects**, highlight:  
  - Falcon 9 landing predictions  
  - Folium launch site analysis  
- For **Accident Severity projects**, focus on:  
  - SQL queries for crash hotspots  
  - Classification model (e.g., Logistic Regression)  

Let me know if you'd like a **domain-specific version** (e.g., for SpaceX, accidents, COVID-19, etc.)! 🎯

About

A comprehensive data science project covering data collection, cleaning, exploratory analysis (EDA), SQL queries, interactive visualizations (Folium/Plotly Dash), and predictive modeling. Developed as the final capstone for the IBM/Coursera Applied Data Science specialization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published