๐ Table of Contents
Transform your data science journey with structured, hands-on learning
This repository provides a comprehensive, modular roadmap to master core data science skills using NumPy, Pandas, and SQL โ perfectly designed for machine learning preparation.
| ๐ฏ Goal | ๐ What You Get |
|---|---|
| Structured Learning | Step-by-step notebooks with clear progression |
| Hands-on Practice | Real datasets and practical exercises |
| Modular Design | Learn at your own pace, mix and match topics |
| ML Ready | Skills directly applicable to machine learning |
๐ฆ Basic_DataScience_4ML/
โโโ ๐๏ธ assets/
โ โโโ ๐ data/
โ โโโ ๐ static/
โโโ ๐ฌ data prepare/
โ โโโ ๐ EDA/ # Exploratory Data Analysis
โ โ โโโ ๐ report.html
โ โ โโโ ๐ phase_1.ipynb
โ โ โโโ ๐ phase_2.ipynb
โ โ โโโ ๐ phase_3.ipynb
โ โ โโโ ๐ boxplot.png
โ โโโ โ๏ธ Feature engineering/ # Transform & Select Features
โ โ โโโ ๐ phase_1.ipynb
โ โ โโโ ๐ phase_2.ipynb
โ โ โโโ ๐ phase_3.ipynb
โ โโโ ๐งน Preprocessing/ # Clean & Prepare Data
โ โ โโโ ๐ phase_1.ipynb
โ โโโ ๐ Visualization/ # Data Visualization
โ โโโ ๐ phase_1.ipynb
โ โโโ ๐ phase_2.ipynb
โ โโโ ๐ phase_3.ipynb
โโโ ๐ข numpy/ # Numerical Computing
โ โโโ ๐ Phase_1.ipynb
โ โโโ ๐ Phase_2.ipynb
โ โโโ ๐ Phase_3.ipynb
โโโ ๐ผ pandas/ # Data Manipulation
โ โโโ ๐ Phase_1.ipynb
โ โโโ ๐ phase_2.ipynb
โ โโโ ๐ phase_3.ipynb
โโโ ๐๏ธ sql/ # Database Queries
โ โโโ ๐ Phase_1.sql
โ โโโ ๐ Phase_2.sql
โ โโโ ๐ Phase_3.sql
โ โโโ ๐ rough.ipynb
โโโ ๐ External Libraries/
โโโ ๐ requirements.txt
โโโ ๐ README.md
| Technology | Version | Purpose |
|---|---|---|
| ๐ Python | โฅ 3.8 | Core programming language |
| ๐ข NumPy | Latest | Numerical computing & arrays |
| ๐ผ Pandas | Latest | Data manipulation & analysis |
| ๐๏ธ SQLite/SQL | Latest | Database operations |
| ๐ Jupyter | Latest | Interactive notebooks |
# Clone the repository
git clone https://github.com/Obiwankenobi699/Basic_DataScience-4ML.git
# Navigate to project
cd Basic_DataScience-4ML
# Install dependencies
pip install -r requirements.txt
# Launch Jupyter
jupyter notebook| ๐ Week | ๐ฏ Focus Area | ๐ Key Concepts | ๐ช Mini Project |
|---|---|---|---|
| Week 1 | ๐ข NumPy Foundations | Arrays, Shapes, Indexing, Slicing | ๐ Vector Math Calculator |
| Week 2 | โก NumPy Advanced | Broadcasting, Dot Products, Linear Algebra | ๐งฎ Matrix Operations Demo |
| Week 3 | ๐ผ Pandas Basics | Series, DataFrames, Basic Operations | ๐ CSV Data Explorer |
| Week 4 | ๐ Pandas Pro | GroupBy, Merging, Advanced Aggregation | ๐ฐ Sales Analytics Dashboard |
| Week 5 | ๐๏ธ SQL Fundamentals | SELECT, WHERE, ORDER BY, Basic Joins | ๐ Query Practice Lab |
| Week 6 | ๐ SQL Mastery | Complex Joins, Subqueries, Window Functions | ๐ข Business Intelligence Case Study |
๐ก Tip: Modules are designed to be flexible - learn at your own pace!
| ๐ ๏ธ Library | ๐ก Best Practice | ๐จ Avoid This |
|---|---|---|
| ๐ข NumPy | Use vectorized operations for speed | Avoid Python loops with arrays |
| ๐ผ Pandas | Start with df.info() & df.describe() |
Don't skip data exploration |
| ๐๏ธ SQL | Begin with simple queries, build complexity | Don't write complex joins immediately |
| ๐ General | Document your analysis process | Don't skip comments in notebooks |
graph LR
A[๐ Theory] --> B[๐ป Practice]
B --> C[๐ฏ Project]
C --> D[๐ Document]
D --> A
๐ง Environment Setup
-
Prerequisites Check
python --version # Should be 3.8+ pip --version # Should be latest
-
Clone Repository
git clone https://github.com/Obiwankenobi699/Basic_DataScience-4ML.git cd Basic_DataScience-4ML -
Setup Virtual Environment (Recommended)
python -m venv venv source venv/bin/activate # Linux/Mac # OR venv\Scripts\activate # Windows
-
Install Dependencies
pip install -r requirements.txt
๐ Launch Jupyter
# Start Jupyter Notebook
jupyter notebook
# OR start Jupyter Lab (recommended)
jupyter labNavigate to the folder structure and start with numpy/Phase_1.ipynb
๐๏ธ SQL Setup
- Option 1: Use SQLite (built-in with Python)
- Option 2: Install DB Browser for SQLite (GUI)
- Option 3: Use your preferred SQL client
| ๐ Module | ๐ฏ Learning Goals | ๐ ๏ธ Tools Used | โฑ๏ธ Time Investment |
|---|---|---|---|
| ๐ฌ EDA | Master exploratory analysis, statistical summaries | Pandas, Matplotlib, Seaborn | 1-2 weeks |
| โ๏ธ Feature Engineering | Transform data, create meaningful features | Pandas, NumPy, Scikit-learn | 1-2 weeks |
| ๐งน Preprocessing | Clean data, handle missing values, scaling | Pandas, NumPy | 1 week |
| ๐ Visualization | Create compelling charts and plots | Matplotlib, Seaborn, Plotly | 1-2 weeks |
| ๐ข NumPy | Array operations, mathematical computations | NumPy | 2 weeks |
| ๐ผ Pandas | Data manipulation, analysis workflows | Pandas | 2-3 weeks |
| ๐๏ธ SQL | Database queries, data retrieval | SQLite, SQL | 2 weeks |
๐ฌ EDA (Exploratory Data Analysis)
- Phase 1: Basic statistics and data overview
- Phase 2: Distribution analysis and correlation
- Phase 3: Advanced patterns and outlier detection
- Deliverable: Interactive HTML reports
โ๏ธ Feature Engineering
- Phase 1: Feature creation and transformation
- Phase 2: Feature selection techniques
- Phase 3: Advanced feature engineering
- Deliverable: Optimized feature sets
๐ข NumPy Mastery
- Phase 1: Array fundamentals and indexing
- Phase 2: Mathematical operations and broadcasting
- Phase 3: Linear algebra and advanced operations
- Deliverable: High-performance numerical solutions
๐ MIT License
This project is open source and available under the MIT License.
Feel free to fork, modify, and use in academic or commercial projects!
We welcome contributions! Here's how you can help:
| ๐ฏ Type | ๐ Description |
|---|---|
| ๐ Bug Reports | Found an issue? Open a GitHub issue |
| ๐ก Feature Requests | Have an idea? We'd love to hear it! |
| ๐ Documentation | Help improve our docs |
| ๐ป Code | Submit pull requests with improvements |
โก Get Started Now | ๐ View Notebooks | ๐ฌ Join Discussion
Happy Learning & Coding! ๐๐โจ
"Data is the new oil, but without the right tools, it's just crude."