Skip to content

Comprehensive data science coursework from 3rd semester B.Tech covering Python programming, data analysis with pandas, visualization using matplotlib and seaborn, statistical methods, machine learning fundamentals, and hands-on projects with real datasets.

Notifications You must be signed in to change notification settings

Mausam5055/Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š Data Science Lab Assignments

Python Pandas Scikit-learn SciPy Status

πŸ“‘ Table of Contents

  1. Overview
  2. Assignments
  3. Experiments
  4. Installation
  5. Repository Structure
  6. Author

🎯 Overview

This repository contains five data science assignments and seventeen experiments covering fundamental statistical analysis, machine learning, and data visualization techniques.

πŸ“Š Assignments

Assignment Topic Key Concepts Tools Used
Assignment-1 Exploratory Data Analysis Data Loading, Feature Analysis, Visualization Pandas, Seaborn
Assignment-2 Confusion Matrix Classification, Model Evaluation, Metrics Scikit-learn, Matplotlib
Assignment-3 Z-Test Analysis Hypothesis Testing, Statistical Significance Statsmodels
Assignment-4 T-Test Analysis Independent & Paired Tests, P-value Analysis SciPy
Assignment-5 Linear Regression Simple & Multiple Regression, Model Evaluation Scikit-learn

πŸ”¬ Experiments

Experiment Topic Key Concepts Tools Used
Experiment-1 Linear Regression with Scikit-learn Model Training, Evaluation, Visualization Scikit-learn, Matplotlib
Experiment-2 Singular Value Decomposition (SVD) Matrix Decomposition, Reconstruction NumPy
Experiment-3 Logistic Regression Visualization Decision Boundary, Classification Scikit-learn, Matplotlib
Experiment-4 T-Test Analysis with SciPy Statistical Hypothesis Testing SciPy
Experiment-5 Confusion Matrix Visualization Model Evaluation, Metrics Scikit-learn, Seaborn
Experiment-6 Decision Tree Classifier Decision Trees, Visualization Scikit-learn, Matplotlib
Experiment-7 [Pending Implementation] - -
Experiment-8 K-Means Clustering with Elbow Method Clustering, Optimal Cluster Selection Scikit-learn, Matplotlib
Experiment-9 DBSCAN Clustering on Customer Data Density-based Clustering, Outlier Detection Scikit-learn, Seaborn
Experiment-10 Gradient Boosting Classifier Ensemble Learning, Boosting Scikit-learn
Experiment-11 Ensemble Voting Classifier Ensemble Learning, Voting Scikit-learn
Experiment-12 Naive Bayes Classifier Probabilistic Classification Scikit-learn
Experiment-13 Linear Discriminant Analysis (LDA) Dimensionality Reduction, Classification Scikit-learn
Experiment-14 Hierarchical Clustering with Dendrogram Hierarchical Clustering, Visualization SciPy, Matplotlib
Experiment-15 Logistic Regression with Decision Boundary Classification, Visualization Scikit-learn, Matplotlib
Experiment-16 Hierarchical Clustering with Dendrogram Hierarchical Clustering, Visualization SciPy, Matplotlib
Experiment-17 [Pending Implementation] - -

For a detailed overview of all experiments, see Data Science Lab Experiments.

πŸ› οΈ Installation

# Clone the repository
git clone https://github.com/Mausam5055/Data-Science.git

# Navigate to the directory
cd Data-Science

# Install required packages
pip install -r Assignment-1/requirements.txt

πŸ“ Repository Structure

.
β”œβ”€β”€ Assignment-1/
β”‚   β”œβ”€β”€ run_titanic_eda.py
β”‚   β”œβ”€β”€ titanic.csv
β”‚   β”œβ”€β”€ README.md
β”‚   └── requirements.txt
β”œβ”€β”€ Assignment-2/
β”‚   β”œβ”€β”€ confusion_matrix_iris.py
β”‚   └── README.md
β”œβ”€β”€ Assignment-3/
β”‚   β”œβ”€β”€ ztest_demo.py
β”‚   └── README.md
β”œβ”€β”€ Assignment-4/
β”‚   β”œβ”€β”€ ttest_demo.py
β”‚   └── README.md
β”œβ”€β”€ Assignment-5/
β”‚   β”œβ”€β”€ linear_regression_demo.py
β”‚   └── README.md
β”œβ”€β”€ Experiment-1/
β”‚   └── main.py
β”œβ”€β”€ Experiment-2/
β”‚   └── main.py
β”œβ”€β”€ Experiment-3/
β”‚   └── main.py
β”œβ”€β”€ Experiment-4/
β”‚   └── main.py
β”œβ”€β”€ Experiment-5/
β”‚   └── main.py
β”œβ”€β”€ Experiment-6/
β”‚   └── main.py
β”œβ”€β”€ Experiment-7/
β”‚   └── main.py
β”œβ”€β”€ Experiment-8/
β”‚   └── main.py
β”œβ”€β”€ Experiment-9/
β”‚   └── main.py
β”œβ”€β”€ Experiment-10/
β”‚   └── main.py
β”œβ”€β”€ Experiment-11/
β”‚   └── main.py
β”œβ”€β”€ Experiment-12/
β”‚   └── main.py
β”œβ”€β”€ Experiment-13/
β”‚   └── main.py
β”œβ”€β”€ Experiment-14/
β”‚   └── main.py
β”œβ”€β”€ Experiment-15/
β”‚   └── main.py
β”œβ”€β”€ Experiment-16/
β”‚   └── main.py
β”œβ”€β”€ Experiment-17/
β”‚   └── main.py
└── FDS_LAB MANUAL.odt

πŸ“Œ Assignment Details

1. Exploratory Data Analysis

  • Dataset: Titanic Dataset
  • Key Features: Passenger Demographics, Survival Analysis
  • Visualizations: Count plots, Histograms, Correlation matrices

2. Confusion Matrix

  • Dataset: Iris Dataset
  • Model: Logistic Regression
  • Metrics: Accuracy, Precision, Recall, F1-Score

3. Z-Test Analysis

  • Implementation: One-sample Z-test
  • Tools: Statsmodels
  • Analysis: Z-score, P-value interpretation

4. T-Test Analysis

  • Types: Independent and Paired T-tests
  • Tools: SciPy
  • Focus: Statistical significance testing

5. Linear Regression

  • Types: Simple and Multiple Linear Regression
  • Metrics: RΒ² Score, MSE
  • Features: Data generation, Model training, Prediction

πŸ”¬ Experiment Details

Experiment-1: Linear Regression with Scikit-learn

  • Concept: Implementation of linear regression using scikit-learn
  • Key Features: Data generation, model training, evaluation, and visualization

Experiment-2: Singular Value Decomposition (SVD)

  • Concept: Matrix decomposition technique
  • Key Features: Decomposition of a matrix and reconstruction from components

Experiment-3: Logistic Regression Visualization

  • Concept: Visualization of logistic regression results
  • Key Features: Decision boundary plotting (incomplete implementation)

Experiment-4: T-Test Analysis with SciPy

  • Concept: Statistical hypothesis testing
  • Key Features: Two-sample t-test implementation and interpretation

Experiment-5: Confusion Matrix Visualization

  • Concept: Model evaluation technique
  • Key Features: Confusion matrix creation and visualization using seaborn

Experiment-6: Decision Tree Classifier

  • Concept: Decision tree algorithm for classification
  • Key Features: Tree visualization, model evaluation with accuracy metrics

Experiment-7: [Pending Implementation]

  • Status: Empty experiment requiring implementation

Experiment-8: K-Means Clustering with Elbow Method

  • Concept: Unsupervised learning clustering technique
  • Key Features: Optimal cluster selection using the elbow method

Experiment-9: DBSCAN Clustering on Customer Data

  • Concept: Density-based clustering algorithm
  • Key Features: Outlier detection and cluster visualization

Experiment-10: Gradient Boosting Classifier

  • Concept: Ensemble learning technique
  • Key Features: Boosting algorithm for classification

Experiment-11: Ensemble Voting Classifier

  • Concept: Ensemble learning through voting
  • Key Features: Combining multiple classifiers for improved performance

Experiment-12: Naive Bayes Classifier

  • Concept: Probabilistic classification algorithm
  • Key Features: Gaussian Naive Bayes with detailed evaluation metrics

Experiment-13: Linear Discriminant Analysis (LDA)

  • Concept: Dimensionality reduction technique
  • Key Features: LDA for feature reduction followed by classification

Experiment-14: Hierarchical Clustering with Dendrogram

  • Concept: Hierarchical clustering algorithm
  • Key Features: Dendrogram visualization of clustering results

Experiment-15: Logistic Regression with Decision Boundary

  • Concept: Binary classification with visualization
  • Key Features: Decision boundary plotting for logistic regression

Experiment-16: Hierarchical Clustering with Dendrogram

  • Concept: Hierarchical clustering algorithm
  • Key Features: Dendrogram visualization of clustering results

Experiment-17: [Pending Implementation]

  • Status: Empty experiment requiring implementation

πŸ”§ Requirements

Package Purpose
NumPy Numerical computations
Pandas Data manipulation
Matplotlib Visualization
Scikit-learn Machine learning
SciPy Statistical analysis
Statsmodels Statistical models
Seaborn Advanced visualization

πŸ‘€ Author

Mausam Kar

Made with ❀️

About

Comprehensive data science coursework from 3rd semester B.Tech covering Python programming, data analysis with pandas, visualization using matplotlib and seaborn, statistical methods, machine learning fundamentals, and hands-on projects with real datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages