Skip to content

codingalzi/ps4ds

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for the book Probability and Statistics for Data Science. A free preprint, videos, code, slides and solutions to exercises are available at https://www.ps4ds.net

Probability

Discrete Variables

Continuous Variables

  • Height Cumulative distribution function, quantiles, probability density function, histogram, kernel density estimation, box plot, Gaussian distribution, maximum likelihood estimation, parametric and nonparametric models
  • Gross domestic product Cumulative distribution function, quantiles, probability density function, histogram, kernel density estimation, box plot
  • Temperatures in Oxford Box plot, quartiles
  • Interarrival times of phone calls Kernel density estimation, nonparametric and parametric models, exponential distribution, maximum likelihood
  • Simulating an exponential Inverse transform sampling

Multiple Discrete Variables

Multiple Continuous Variables

Discrete and Continuous Variables

  • Temperature and precipitation in Mauna Loa Joint distribution of discrete and continuous variables, marginal distributions, conditional distributions, kernel density estimation
  • Height and sex Mixture model, Gaussian parametric model, joint distribution of discrete and continuous variables, marginal distributions, conditional distributions
  • Height and handedness Joint distribution of discrete and continuous variables, independence, kernel density estimation
  • Alzheimer's diagnostics Classification, Gaussian random vectors, Gaussian discriminant analysis, quadratic discriminant analysis, linear discriminant analysis, maximum likelihood, parametric models
  • Clustering according to height Gaussian mixture model, expectation maximization algorithm, clustering, unsupervised learning
  • Clustering NBA players Gaussian mixture model, expectation maximization algorithm, clustering, unsupervised learning
  • Election poll Bayesian parametric modeling, beta distribution, prior and posterior distributions, conjugate prior
  • How not to predict an election Bayesian parametric modeling, independence, conditional independence, Monte Carlo method

Averaging

Correlation

Estimation of Population Parameters

Hypothesis Testing

Principal Component Analysis and Low-Rank Models

  • Gaussian random vector Mean of a random vector vector, covariance matrix, directional variance, principal component analysis, spectral theorem
  • Canadian cities Sample mean of a vector, sample covariance matrix, principal component analysis, spectral theorem
  • Faces Principal component analysis, dimensionality reduction, sample mean of a vector
  • Wheat seeds Sample covariance matrix, principal component analysis, dimensionality reduction
  • Face classification Principal component analysis, dimensionality reduction, nearest neighbor
  • Temperatures in the United States Sample covariance matrix, singular value decomposition, principal component analysis, low-rank model
  • Prediction of movie ratings (cartoon example) Low rank model, singular value decomposition, matrix completion, collaborative filtering, singular-value thresholding, imputation
  • Prediction of movie ratings (real data) Low rank model, singular value decomposition, matrix completion, collaborative filtering, singular-value thresholding, imputation
  • Topic modeling Low-rank model, singular value decomposition, nonnegative matrix factorization

Regression and Classification

About

Probability and Statistics for Data Science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%