This repository contains all problem sets (psets) for the Data Science course held in Spring 2025 at the Computer Science Department of Shahid Beheshti University, taught by Dr. Saeid Reza Kherad Pisheh. Each problem set includes two main components:
- Question Set (document.pdf): A PDF containing the problem statements.
- Answer Set: Solutions provided in Jupyter Notebook format, alongside a PDF, including the report of the analysis of the notebook. In some cases (e.g., pset0), there is a standalone PDF solution file.
Below is a high-level summary of the repository structure, followed by detailed information for each problem set—including a brief summary of the problem set, and direct links important to files.
├── pset0
│ ├── document.pdf
│ └── pset0_solution.pdf
│
├── pset1
│ ├── document.pdf
│ ├── Amazon Sales Analysis
│ │ ├── amazon_sales_analysis.ipynb
│ │ └── amazon_sales_analysis.pdf
│ └── Customer Personality Analysis
│ ├── customer_personality_analysis.ipynb
│ └── customer_personality_analysis.pdf
│
├── pset2
│ ├── document.pdf
│ ├── youtube_tranding_videos_analysis.ipynb
│ ├── youtube_tranding_videos_analysis.pdf
| └── Theoretical
└── theoretical.pdf
│
├── pset3
│ ├── document.pdf
│ ├── user_segmentation_brazillian_ecommerce.ipynb
│ ├── user_segmentation_brazillian_ecommerce.pdf
| └── Theoretical
| └── theoretical.pdf
│
├── pset4
│ ├── document.pdf
│ ├── disease_detection.ipynb
│ ├── disease_detection.pdf
| └── Theoretical
| └── theoretical.pdf
|
└── pset5
├── document.pdf
├── insurance_policy_cost_prediction.ipynb
├── insurance_policy_cost_prediction.pdf
└── Theoretical
└── theoretical.pdf
Summary:
Introductory exercises focusing on data loading, cleaning, summary statistics, and simple visualizations using pandas and matplotlib.
Techniques Applied:
- Writing A Formal Data Analysis Report (Including; Partitioning the Report into Different Sections such as, Abstract, Introduction, Data, Methodology, Conclusion, etc.)
Question Set:
Answer Set:
Summary:
This set includes two independent analyses:
- Amazon Sales Analysis: Time-series and categorical analysis of Amazon sales data — revenue trends, product/category comparisons, forecasting using regression.
- Customer Personality Analysis: Clustering and personality segmentation using survey and spending data. RFM features, K-means clustering, and PCA-based visualization.
Techniques Applied:
- Exploratory Data Analysis
- Data Preprocessing
- Hypothesis Testing
Question Set:
Answer Set:
Answer Set:
Summary:
Analysis of trending YouTube video data — feature extraction, correlation between engagement metrics, linear regression for view prediction.
Techniques Applied:
- Exploratory Data Analysis
- Data Preprocessing
- Hypothesis Testing
Question Set:
Answer Set:
Summary:
User segmentation via RFM analysis on Brazilian e-commerce dataset. K-means clustering, dendrogram-based validation, customer lifetime insights.
Techniques Applied:
- Exploratory Data Analysis
- Data Preprocessing
- Feature Engineering
- Clustering (K-means, DBSCAN, PCA, t-SNE, Elbow Curve, etc.)
Question Set:
Answer Set:
Summary:
Classification model for disease prediction using medical data. Preprocessing, logistic regression or CNN model, metrics evaluation, and ethical discussion.
Techniques Applied:
- Exploratory Data Analysis
- Data Preprocessing
- Classification
- Basic Methods:
- LDA
- Logistic Regression
- Naive Bayes
- SVM
- Ensemble Methods:
- Random Forests
- AdaBoost
- XGBoost
- Voting
- Basic Methods:
Question Set:
Answer Set:
Summary:
Regression models for insurance policy cost prediction.
Techniques Applied:
- Exploratory Data Analysis
- Data Preprocessing
- Regression
- Random Forests
- AdaBoost
- XGBoost
- Light Boost
- Cat Boost
- Polynomial Regression with Ridge Cost
Question Set:
Answer Set: