Skip to content

obiwankenobi699/Basic_DataScience-4ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

37 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“Š Basic Data Science for Machine Learning

A modular roadmap to master NumPy, Pandas, and SQL

Python NumPy Pandas Jupyter License Contributors


๐Ÿงญ Navigation

๐Ÿ“‹ Table of Contents
  1. ๐ŸŽฏ Overview
  2. ๐Ÿ“‚ Project Structure
  3. ๐Ÿ”ง Technology Stack
  4. ๐Ÿ“… Study Timeline
  5. ๐Ÿ’ก Best Practices
  6. ๐Ÿš€ Getting Started
  7. ๐Ÿ“š Modules & Details
  8. ๐Ÿ“„ License
  9. ๐Ÿ‘จโ€๐Ÿ’ป Credits

๐ŸŽฏ Overview

Transform your data science journey with structured, hands-on learning

This repository provides a comprehensive, modular roadmap to master core data science skills using NumPy, Pandas, and SQL โ€” perfectly designed for machine learning preparation.

๐ŸŒŸ Why This Repository?

๐ŸŽฏ Goal ๐Ÿ“– What You Get
Structured Learning Step-by-step notebooks with clear progression
Hands-on Practice Real datasets and practical exercises
Modular Design Learn at your own pace, mix and match topics
ML Ready Skills directly applicable to machine learning

๐Ÿ“‚ Project Structure

๐Ÿ“ฆ Basic_DataScience_4ML/
โ”œโ”€โ”€ ๐Ÿ—‚๏ธ assets/
โ”‚   โ””โ”€โ”€ ๐Ÿ“Š data/
โ”‚       โ””โ”€โ”€ ๐Ÿ“ static/
โ”œโ”€โ”€ ๐Ÿ”ฌ data prepare/
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ˆ EDA/                    # Exploratory Data Analysis
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“‹ report.html
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ““ phase_1.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ““ phase_2.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ““ phase_3.ipynb
โ”‚   โ”‚   โ””โ”€โ”€ ๐Ÿ“Š boxplot.png
โ”‚   โ”œโ”€โ”€ โš™๏ธ Feature engineering/     # Transform & Select Features
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ““ phase_1.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ““ phase_2.ipynb
โ”‚   โ”‚   โ””โ”€โ”€ ๐Ÿ““ phase_3.ipynb
โ”‚   โ”œโ”€โ”€ ๐Ÿงน Preprocessing/           # Clean & Prepare Data
โ”‚   โ”‚   โ””โ”€โ”€ ๐Ÿ““ phase_1.ipynb
โ”‚   โ””โ”€โ”€ ๐Ÿ“Š Visualization/           # Data Visualization
โ”‚       โ”œโ”€โ”€ ๐Ÿ““ phase_1.ipynb
โ”‚       โ”œโ”€โ”€ ๐Ÿ““ phase_2.ipynb
โ”‚       โ””โ”€โ”€ ๐Ÿ““ phase_3.ipynb
โ”œโ”€โ”€ ๐Ÿ”ข numpy/                       # Numerical Computing
โ”‚   โ”œโ”€โ”€ ๐Ÿ““ Phase_1.ipynb
โ”‚   โ”œโ”€โ”€ ๐Ÿ““ Phase_2.ipynb
โ”‚   โ””โ”€โ”€ ๐Ÿ““ Phase_3.ipynb
โ”œโ”€โ”€ ๐Ÿผ pandas/                      # Data Manipulation
โ”‚   โ”œโ”€โ”€ ๐Ÿ““ Phase_1.ipynb
โ”‚   โ”œโ”€โ”€ ๐Ÿ““ phase_2.ipynb
โ”‚   โ””โ”€โ”€ ๐Ÿ““ phase_3.ipynb
โ”œโ”€โ”€ ๐Ÿ—„๏ธ sql/                        # Database Queries
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ Phase_1.sql
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ Phase_2.sql
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ Phase_3.sql
โ”‚   โ””โ”€โ”€ ๐Ÿ““ rough.ipynb
โ”œโ”€โ”€ ๐Ÿ“š External Libraries/
โ”œโ”€โ”€ ๐Ÿ“‹ requirements.txt
โ””โ”€โ”€ ๐Ÿ“– README.md

๐Ÿ”ง Technology Stack & Requirements

๐Ÿ› ๏ธ Core Technologies

Technology Version Purpose
๐Ÿ Python โ‰ฅ 3.8 Core programming language
๐Ÿ”ข NumPy Latest Numerical computing & arrays
๐Ÿผ Pandas Latest Data manipulation & analysis
๐Ÿ—„๏ธ SQLite/SQL Latest Database operations
๐Ÿ““ Jupyter Latest Interactive notebooks

โšก Quick Setup

# Clone the repository
git clone https://github.com/Obiwankenobi699/Basic_DataScience-4ML.git

# Navigate to project
cd Basic_DataScience-4ML

# Install dependencies
pip install -r requirements.txt

# Launch Jupyter
jupyter notebook

๐Ÿ“… Study Timeline

๐Ÿ—“๏ธ 6-Week Intensive Roadmap

๐Ÿ“… Week ๐ŸŽฏ Focus Area ๐Ÿ“š Key Concepts ๐ŸŽช Mini Project
Week 1 ๐Ÿ”ข NumPy Foundations Arrays, Shapes, Indexing, Slicing ๐Ÿ“ Vector Math Calculator
Week 2 โšก NumPy Advanced Broadcasting, Dot Products, Linear Algebra ๐Ÿงฎ Matrix Operations Demo
Week 3 ๐Ÿผ Pandas Basics Series, DataFrames, Basic Operations ๐Ÿ“Š CSV Data Explorer
Week 4 ๐Ÿš€ Pandas Pro GroupBy, Merging, Advanced Aggregation ๐Ÿ’ฐ Sales Analytics Dashboard
Week 5 ๐Ÿ—„๏ธ SQL Fundamentals SELECT, WHERE, ORDER BY, Basic Joins ๐Ÿ” Query Practice Lab
Week 6 ๐Ÿ’Ž SQL Mastery Complex Joins, Subqueries, Window Functions ๐Ÿข Business Intelligence Case Study

๐Ÿ’ก Tip: Modules are designed to be flexible - learn at your own pace!


๐Ÿ’ก Best Practices

๐ŸŒŸ Golden Rules for Data Science Success

๐Ÿ› ๏ธ Library ๐Ÿ’ก Best Practice ๐Ÿšจ Avoid This
๐Ÿ”ข NumPy Use vectorized operations for speed Avoid Python loops with arrays
๐Ÿผ Pandas Start with df.info() & df.describe() Don't skip data exploration
๐Ÿ—„๏ธ SQL Begin with simple queries, build complexity Don't write complex joins immediately
๐Ÿ“Š General Document your analysis process Don't skip comments in notebooks

๐ŸŽฏ Learning Strategy

graph LR
    A[๐Ÿ“– Theory] --> B[๐Ÿ’ป Practice]
    B --> C[๐ŸŽฏ Project]
    C --> D[๐Ÿ“ Document]
    D --> A
Loading

๐Ÿš€ Getting Started

๐ŸŽฌ Quick Start Guide

๐Ÿ”ง Environment Setup
  1. Prerequisites Check

    python --version  # Should be 3.8+
    pip --version     # Should be latest
  2. Clone Repository

    git clone https://github.com/Obiwankenobi699/Basic_DataScience-4ML.git
    cd Basic_DataScience-4ML
  3. Setup Virtual Environment (Recommended)

    python -m venv venv
    source venv/bin/activate  # Linux/Mac
    # OR
    venv\Scripts\activate     # Windows
  4. Install Dependencies

    pip install -r requirements.txt
๐Ÿ““ Launch Jupyter
# Start Jupyter Notebook
jupyter notebook

# OR start Jupyter Lab (recommended)
jupyter lab

Navigate to the folder structure and start with numpy/Phase_1.ipynb

๐Ÿ—„๏ธ SQL Setup
  • Option 1: Use SQLite (built-in with Python)
  • Option 2: Install DB Browser for SQLite (GUI)
  • Option 3: Use your preferred SQL client

๐Ÿ“š Modules & Details

๐ŸŽ“ Learning Modules Overview

๐Ÿ“‚ Module ๐ŸŽฏ Learning Goals ๐Ÿ› ๏ธ Tools Used โฑ๏ธ Time Investment
๐Ÿ”ฌ EDA Master exploratory analysis, statistical summaries Pandas, Matplotlib, Seaborn 1-2 weeks
โš™๏ธ Feature Engineering Transform data, create meaningful features Pandas, NumPy, Scikit-learn 1-2 weeks
๐Ÿงน Preprocessing Clean data, handle missing values, scaling Pandas, NumPy 1 week
๐Ÿ“Š Visualization Create compelling charts and plots Matplotlib, Seaborn, Plotly 1-2 weeks
๐Ÿ”ข NumPy Array operations, mathematical computations NumPy 2 weeks
๐Ÿผ Pandas Data manipulation, analysis workflows Pandas 2-3 weeks
๐Ÿ—„๏ธ SQL Database queries, data retrieval SQLite, SQL 2 weeks

๐ŸŽฏ Module Deep Dive

๐Ÿ”ฌ EDA (Exploratory Data Analysis)
  • Phase 1: Basic statistics and data overview
  • Phase 2: Distribution analysis and correlation
  • Phase 3: Advanced patterns and outlier detection
  • Deliverable: Interactive HTML reports
โš™๏ธ Feature Engineering
  • Phase 1: Feature creation and transformation
  • Phase 2: Feature selection techniques
  • Phase 3: Advanced feature engineering
  • Deliverable: Optimized feature sets
๐Ÿ”ข NumPy Mastery
  • Phase 1: Array fundamentals and indexing
  • Phase 2: Mathematical operations and broadcasting
  • Phase 3: Linear algebra and advanced operations
  • Deliverable: High-performance numerical solutions

๐Ÿ“„ License

๐Ÿ“œ MIT License

This project is open source and available under the MIT License.

Feel free to fork, modify, and use in academic or commercial projects!


๐Ÿ‘จโ€๐Ÿ’ป Credits

๐ŸŒŸ Created with โค๏ธ by

Obiwankenobi699


๐Ÿค Contributing

We welcome contributions! Here's how you can help:

๐ŸŽฏ Type ๐Ÿ“ Description
๐Ÿ› Bug Reports Found an issue? Open a GitHub issue
๐Ÿ’ก Feature Requests Have an idea? We'd love to hear it!
๐Ÿ“– Documentation Help improve our docs
๐Ÿ’ป Code Submit pull requests with improvements

๐Ÿ“ž Get in Touch

GitHub Issues Pull Requests


๐ŸŽ‰ Ready to Start Your Data Science Journey?

โšก Get Started Now | ๐Ÿ“š View Notebooks | ๐Ÿ’ฌ Join Discussion


Happy Learning & Coding! ๐Ÿš€๐Ÿ“Šโœจ

"Data is the new oil, but without the right tools, it's just crude."

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published