Modern Python Data Science Bootcamp

A comprehensive, self-paced bootcamp delivered as interactive Marimo notebooks covering modern Python fundamentals, local data stack, and distributed computing.

Overview

This bootcamp provides a structured pathway for staff and developers at the University of Idaho OIT to acquire modern data science skills. The content bridges local development with Polars and DuckDB to distributed computing with PySpark and Databricks.

What You'll Learn

Module 0: Environment & Tooling - Set up your development environment with Python 3.14+, UV, and modern tooling
Module 1: Modern Python - Learn Python 3.14+ patterns, type hints, and professional coding practices
Module 2: Local Data Stack - Master Polars and DuckDB for efficient local data analysis
Module 3: Data Acquisition - Collect data from APIs, web scraping, and various sources
Module 4: Data Cleaning - Clean and validate messy data with Pydantic and Pandera
Module 5: Feature Engineering - Build reproducible data pipelines and engineer features
Module 6: Visualization - Create effective visualizations to communicate insights
Module 7: Machine Learning - Build classification models with scikit-learn
Module 8: Databricks - Scale up to distributed computing with PySpark and Databricks

Quick Start

Prerequisites

Python 3.14 or later
UV package manager installed
Git

Installation

Clone the repository:

git clone https://github.com/ncolesummers/data-engineering-bootcamp.git
cd data-engineering-bootcamp

Set up the environment:

uv sync

Verify installation:

python -c "from bootcamp import __version__; print(__version__)"

You should see 0.1.0 printed to the console.

Running Notebooks

Notebooks are delivered using Marimo, a reactive Python notebook platform.

To run a notebook:

marimo edit notebooks/module_00_environment/00_01_welcome.py

Repository Structure

data-engineering-bootcamp/
├── pyproject.toml          # Project configuration and dependencies
├── uv.lock                 # Locked dependencies for reproducibility
├── README.md               # This file
├── docs/                   # Documentation and guides
│   ├── 01-curriculum-architecture.md
│   ├── 02-prd.md
│   ├── 03-premortem.md
│   └── 04-backlog-structure.md
├── src/
│   └── bootcamp/           # Python package for shared code
│       ├── datasets/       # Sample and synthetic datasets
│       ├── solutions/      # Exercise solutions
│       └── utils/          # Shared utilities
├── notebooks/              # Interactive Marimo notebooks
│   ├── module_00_environment/
│   ├── module_01_modern_python/
│   ├── module_02_local_data_stack/
│   ├── module_03_data_acquisition/
│   ├── module_04_data_cleaning/
│   ├── module_05_feature_engineering/
│   ├── module_06_visualization/
│   ├── module_07_machine_learning/
│   └── module_08_databricks/
└── tests/                  # Test files

Getting Started with Learning

Start with Module 0: Environment & Tooling to set up your development environment and get familiar with the tools used throughout the bootcamp.

Each module builds on previous modules, so it's recommended to progress sequentially. However, experienced learners can skip ahead using the prerequisite information in each notebook's metadata.

Documentation

Product Requirements Document - Detailed specifications and requirements
Curriculum Architecture - Learning objectives and structure
Backlog Structure - GitHub issues and development tracking

Contributing

This bootcamp is under active development. Contributions are welcome!

Use the Epic template for major deliverable proposals
Use the User Story template for individual tasks or features

See the GitHub Issues for current development status.

Technology Stack

Runtime: Python 3.14+
Package Manager: UV
Notebooks: Marimo
DataFrames: Polars
SQL Engine: DuckDB
Validation: Pydantic, Pandera
ML: scikit-learn
Distributed: PySpark (Databricks)

License

MIT License - See LICENSE file for details.

Contact

Nathan Summers - nsummers72@gmail.com

Project Link: https://github.com/ncolesummers/data-engineering-bootcamp

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.agents/skills		.agents/skills
.claude/skills		.claude/skills
.github		.github
docs		docs
notebooks		notebooks
src/bootcamp		src/bootcamp
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern Python Data Science Bootcamp

Overview

What You'll Learn

Quick Start

Prerequisites

Installation

Running Notebooks

Repository Structure

Getting Started with Learning

Documentation

Contributing

Technology Stack

License

Contact

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

ncolesummers/data-engineering-bootcamp

Folders and files

Latest commit

History

Repository files navigation

Modern Python Data Science Bootcamp

Overview

What You'll Learn

Quick Start

Prerequisites

Installation

Running Notebooks

Repository Structure

Getting Started with Learning

Documentation

Contributing

Technology Stack

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages