Skip to content

This repository includes problem set questions for the Data Science course held in Spring 2025 at CS dept. of Shahid Beheshti University.

Notifications You must be signed in to change notification settings

MMDPROJECT/datascience-assignments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Assignments (Spring 2025)

This repository contains all problem sets (psets) for the Data Science course held in Spring 2025 at the Computer Science Department of Shahid Beheshti University, taught by Dr. Saeid Reza Kherad Pisheh. Each problem set includes two main components:

  1. Question Set (document.pdf): A PDF containing the problem statements.
  2. Answer Set: Solutions provided in Jupyter Notebook format, alongside a PDF, including the report of the analysis of the notebook. In some cases (e.g., pset0), there is a standalone PDF solution file.

Below is a high-level summary of the repository structure, followed by detailed information for each problem set—including a brief summary of the problem set, and direct links important to files.


Repository Structure

├── pset0
│ ├── document.pdf
│ └── pset0_solution.pdf
│
├── pset1
│ ├── document.pdf
│ ├── Amazon Sales Analysis
│ │ ├── amazon_sales_analysis.ipynb
│ │ └── amazon_sales_analysis.pdf
│ └── Customer Personality Analysis
│   ├── customer_personality_analysis.ipynb
│   └── customer_personality_analysis.pdf
│
├── pset2
│ ├── document.pdf
│ ├── youtube_tranding_videos_analysis.ipynb
│ ├── youtube_tranding_videos_analysis.pdf
| └── Theoretical
    └── theoretical.pdf
│
├── pset3
│ ├── document.pdf
│ ├── user_segmentation_brazillian_ecommerce.ipynb
│ ├── user_segmentation_brazillian_ecommerce.pdf
| └── Theoretical
|    └── theoretical.pdf
│
├── pset4
│ ├── document.pdf
│ ├── disease_detection.ipynb 
│ ├── disease_detection.pdf
| └── Theoretical
|    └── theoretical.pdf
|
└── pset5
  ├── document.pdf
  ├── insurance_policy_cost_prediction.ipynb
  ├── insurance_policy_cost_prediction.pdf
  └── Theoretical
    └── theoretical.pdf

pset0

Summary:
Introductory exercises focusing on data loading, cleaning, summary statistics, and simple visualizations using pandas and matplotlib.

Techniques Applied:

  • Writing A Formal Data Analysis Report (Including; Partitioning the Report into Different Sections such as, Abstract, Introduction, Data, Methodology, Conclusion, etc.)

Question Set:

Answer Set:


pset1

Summary:
This set includes two independent analyses:

  1. Amazon Sales Analysis: Time-series and categorical analysis of Amazon sales data — revenue trends, product/category comparisons, forecasting using regression.
  2. Customer Personality Analysis: Clustering and personality segmentation using survey and spending data. RFM features, K-means clustering, and PCA-based visualization.

Techniques Applied:

  • Exploratory Data Analysis
  • Data Preprocessing
  • Hypothesis Testing

Question Set:

Amazon Sales Analysis

Answer Set:

Customer Personality Analysis

Answer Set:


pset2

Summary:
Analysis of trending YouTube video data — feature extraction, correlation between engagement metrics, linear regression for view prediction.

Techniques Applied:

  • Exploratory Data Analysis
  • Data Preprocessing
  • Hypothesis Testing

Question Set:

Answer Set:


pset3

Summary:
User segmentation via RFM analysis on Brazilian e-commerce dataset. K-means clustering, dendrogram-based validation, customer lifetime insights.

Techniques Applied:

  • Exploratory Data Analysis
  • Data Preprocessing
  • Feature Engineering
  • Clustering (K-means, DBSCAN, PCA, t-SNE, Elbow Curve, etc.)

Question Set:

Answer Set:


pset4

Summary:
Classification model for disease prediction using medical data. Preprocessing, logistic regression or CNN model, metrics evaluation, and ethical discussion.

Techniques Applied:

  • Exploratory Data Analysis
  • Data Preprocessing
  • Classification
    • Basic Methods:
      • LDA
      • Logistic Regression
      • Naive Bayes
      • SVM
    • Ensemble Methods:
      • Random Forests
      • AdaBoost
      • XGBoost
      • Voting

Question Set:

Answer Set:

pset5

Summary:
Regression models for insurance policy cost prediction.

Techniques Applied:

  • Exploratory Data Analysis
  • Data Preprocessing
  • Regression
    • Random Forests
    • AdaBoost
    • XGBoost
    • Light Boost
    • Cat Boost
    • Polynomial Regression with Ridge Cost

Question Set:

Answer Set:


About

This repository includes problem set questions for the Data Science course held in Spring 2025 at CS dept. of Shahid Beheshti University.

Topics

Resources

Stars

Watchers

Forks