This repository contains five data science assignments and seventeen experiments covering fundamental statistical analysis, machine learning, and data visualization techniques.
| Assignment | Topic | Key Concepts | Tools Used |
|---|---|---|---|
| Assignment-1 | Exploratory Data Analysis | Data Loading, Feature Analysis, Visualization | Pandas, Seaborn |
| Assignment-2 | Confusion Matrix | Classification, Model Evaluation, Metrics | Scikit-learn, Matplotlib |
| Assignment-3 | Z-Test Analysis | Hypothesis Testing, Statistical Significance | Statsmodels |
| Assignment-4 | T-Test Analysis | Independent & Paired Tests, P-value Analysis | SciPy |
| Assignment-5 | Linear Regression | Simple & Multiple Regression, Model Evaluation | Scikit-learn |
| Experiment | Topic | Key Concepts | Tools Used |
|---|---|---|---|
| Experiment-1 | Linear Regression with Scikit-learn | Model Training, Evaluation, Visualization | Scikit-learn, Matplotlib |
| Experiment-2 | Singular Value Decomposition (SVD) | Matrix Decomposition, Reconstruction | NumPy |
| Experiment-3 | Logistic Regression Visualization | Decision Boundary, Classification | Scikit-learn, Matplotlib |
| Experiment-4 | T-Test Analysis with SciPy | Statistical Hypothesis Testing | SciPy |
| Experiment-5 | Confusion Matrix Visualization | Model Evaluation, Metrics | Scikit-learn, Seaborn |
| Experiment-6 | Decision Tree Classifier | Decision Trees, Visualization | Scikit-learn, Matplotlib |
| Experiment-7 | [Pending Implementation] | - | - |
| Experiment-8 | K-Means Clustering with Elbow Method | Clustering, Optimal Cluster Selection | Scikit-learn, Matplotlib |
| Experiment-9 | DBSCAN Clustering on Customer Data | Density-based Clustering, Outlier Detection | Scikit-learn, Seaborn |
| Experiment-10 | Gradient Boosting Classifier | Ensemble Learning, Boosting | Scikit-learn |
| Experiment-11 | Ensemble Voting Classifier | Ensemble Learning, Voting | Scikit-learn |
| Experiment-12 | Naive Bayes Classifier | Probabilistic Classification | Scikit-learn |
| Experiment-13 | Linear Discriminant Analysis (LDA) | Dimensionality Reduction, Classification | Scikit-learn |
| Experiment-14 | Hierarchical Clustering with Dendrogram | Hierarchical Clustering, Visualization | SciPy, Matplotlib |
| Experiment-15 | Logistic Regression with Decision Boundary | Classification, Visualization | Scikit-learn, Matplotlib |
| Experiment-16 | Hierarchical Clustering with Dendrogram | Hierarchical Clustering, Visualization | SciPy, Matplotlib |
| Experiment-17 | [Pending Implementation] | - | - |
For a detailed overview of all experiments, see Data Science Lab Experiments.
# Clone the repository
git clone https://github.com/Mausam5055/Data-Science.git
# Navigate to the directory
cd Data-Science
# Install required packages
pip install -r Assignment-1/requirements.txt.
βββ Assignment-1/
β βββ run_titanic_eda.py
β βββ titanic.csv
β βββ README.md
β βββ requirements.txt
βββ Assignment-2/
β βββ confusion_matrix_iris.py
β βββ README.md
βββ Assignment-3/
β βββ ztest_demo.py
β βββ README.md
βββ Assignment-4/
β βββ ttest_demo.py
β βββ README.md
βββ Assignment-5/
β βββ linear_regression_demo.py
β βββ README.md
βββ Experiment-1/
β βββ main.py
βββ Experiment-2/
β βββ main.py
βββ Experiment-3/
β βββ main.py
βββ Experiment-4/
β βββ main.py
βββ Experiment-5/
β βββ main.py
βββ Experiment-6/
β βββ main.py
βββ Experiment-7/
β βββ main.py
βββ Experiment-8/
β βββ main.py
βββ Experiment-9/
β βββ main.py
βββ Experiment-10/
β βββ main.py
βββ Experiment-11/
β βββ main.py
βββ Experiment-12/
β βββ main.py
βββ Experiment-13/
β βββ main.py
βββ Experiment-14/
β βββ main.py
βββ Experiment-15/
β βββ main.py
βββ Experiment-16/
β βββ main.py
βββ Experiment-17/
β βββ main.py
βββ FDS_LAB MANUAL.odt
- Dataset: Titanic Dataset
- Key Features: Passenger Demographics, Survival Analysis
- Visualizations: Count plots, Histograms, Correlation matrices
- Dataset: Iris Dataset
- Model: Logistic Regression
- Metrics: Accuracy, Precision, Recall, F1-Score
- Implementation: One-sample Z-test
- Tools: Statsmodels
- Analysis: Z-score, P-value interpretation
- Types: Independent and Paired T-tests
- Tools: SciPy
- Focus: Statistical significance testing
- Types: Simple and Multiple Linear Regression
- Metrics: RΒ² Score, MSE
- Features: Data generation, Model training, Prediction
- Concept: Implementation of linear regression using scikit-learn
- Key Features: Data generation, model training, evaluation, and visualization
- Concept: Matrix decomposition technique
- Key Features: Decomposition of a matrix and reconstruction from components
- Concept: Visualization of logistic regression results
- Key Features: Decision boundary plotting (incomplete implementation)
- Concept: Statistical hypothesis testing
- Key Features: Two-sample t-test implementation and interpretation
- Concept: Model evaluation technique
- Key Features: Confusion matrix creation and visualization using seaborn
- Concept: Decision tree algorithm for classification
- Key Features: Tree visualization, model evaluation with accuracy metrics
- Status: Empty experiment requiring implementation
- Concept: Unsupervised learning clustering technique
- Key Features: Optimal cluster selection using the elbow method
- Concept: Density-based clustering algorithm
- Key Features: Outlier detection and cluster visualization
- Concept: Ensemble learning technique
- Key Features: Boosting algorithm for classification
- Concept: Ensemble learning through voting
- Key Features: Combining multiple classifiers for improved performance
- Concept: Probabilistic classification algorithm
- Key Features: Gaussian Naive Bayes with detailed evaluation metrics
- Concept: Dimensionality reduction technique
- Key Features: LDA for feature reduction followed by classification
- Concept: Hierarchical clustering algorithm
- Key Features: Dendrogram visualization of clustering results
- Concept: Binary classification with visualization
- Key Features: Decision boundary plotting for logistic regression
- Concept: Hierarchical clustering algorithm
- Key Features: Dendrogram visualization of clustering results
- Status: Empty experiment requiring implementation
| Package | Purpose |
|---|---|
| NumPy | Numerical computations |
| Pandas | Data manipulation |
| Matplotlib | Visualization |
| Scikit-learn | Machine learning |
| SciPy | Statistical analysis |
| Statsmodels | Statistical models |
| Seaborn | Advanced visualization |
Mausam Kar