Skip to content

Course repository for 95-885 Data Science & Big Data, Fall 2025. Contains Python implementations, covering multiple classes in Data Science, Big Data, Machine Learning, and related topics. Includes notebooks, code, and practice exercises across probability, optimization, algorithms, and applied computing.

Notifications You must be signed in to change notification settings

Irene-Busah/Big-Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Data Science & Big Data

95-885, Fall 2025 – Carnegie Mellon University

This repository contains comprehensive coursework and hands-on implementations for 95-885 Data Science & Big Data. It includes Python notebooks, code files, assignments, and practice exercises covering a wide range of topics across:

  • Probability & statistical modeling

  • Algorithms & optimization

  • Applied Machine Learning

  • Big Data processing & distributed systems

  • Applied computing & data engineering

🎯 Key Learning Outcomes

  1. Designing end-to-end data science and machine learning solutions, from data ingestion and preprocessing to modeling, evaluation, and deployment. Projects reflect real-world use cases—suitable for solving both industry problems and academic research challenges.

  2. Hands-on practice with Big Data tools, including:

    • Apache Spark for distributed data processing

    • Hadoop ecosystem tools

    • Cloud data handling

  3. Building production-ready pipelines using tools like Pandas, Scikit-learn, PySpark, and Hadoop streaming, and integrating them with machine learning models.

🧠 Contents

Class-Labs/: Jupyter Notebooks used in class labs and practical projects

Assignments/: Clean, tested Python scripts and reports

Documents/: Sample Research Papers

projects/: Capstone or course mini-projects on real datasets

🚀 Skills Gained

By the end of this course, learners will be able to:

  1. Handle massive datasets and perform distributed computation

  2. Apply statistical methods and ML models to large-scale problems

  3. Understand performance bottlenecks in data pipelines

  4. Translate academic theory into practical, scalable solutions

About

Course repository for 95-885 Data Science & Big Data, Fall 2025. Contains Python implementations, covering multiple classes in Data Science, Big Data, Machine Learning, and related topics. Includes notebooks, code, and practice exercises across probability, optimization, algorithms, and applied computing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages