The goal of this page is to gather resources and learning materials across a broad range of popular data science topics and arrange them thematically. Resources have been selected because they are
- High quality
- Free of charge
- Donโt require readers to sign up
Remember that material that is offered freely on the web is paid for by the authorโs time - if you find a resource particularly useful, consider supporting them in whatever way they prefer. If you find this page useful please share it and spread the word! If you find a mistake or broken link, please file an issue or submit a pull request.
Key to resource types
- ๐ = Course
- ๐ = Tutorial or blog post
- ๐ = Book or book chapter
โถ๏ธ = Video or webinar- ๐ง = Podcast or audio recording
- ๐ฅ = Community or user forum
- ๐ = Journal or technical article
- ๐ก = Cheat sheet
- โ = List
- ๐ Modern Dive: Getting Started by Chester Ismay and Albert Y.
Kim.
- The very first of first steps. Install R & RStudio and what to do after that.
- ๐ RYouWithMe: Basic Basics by Lisa Williams, RLadies
Sydney.
- Tour of RStudio, installing and using packages and getting data into RStudio.
- ๐ Teacups, Statistics and Giraffes by Hasse Walum and Desirรฉe
de
Leon.
- Accessible introduction to R and statistics with interactive coding exercises.
โถ๏ธ A Gentle Introduction to Tidy Statistics in R by Thomas Mock, RStudio.- Webinar covering exploratory data analysis, tidyverse, statistical testing and plotting.
- ๐ The R Bootcamp by Ted Laderas and Jessica
Minnier.
- A tidyverse-centric interactive course for data manipulation, graphics, data reshaping, and statistical modelling.
- ๐ RStudio Primers by
RStudio.
- Interactive tutorials from RStudio covering data manipulation, visualisation and programming with R.
- ๐ Swirl: Learn R, in R by Ismael Fernรกndez, Nick Carchedi and
Sean Kross.
- Learn R with interactive courses in the console.
- ๐ Using R for Data Journalism by Andrew Ba
Tran.
- Video supported intro course with emphasis on wrangling and visualisation.
- ๐ R for Data Science by Garrett Grolemund and Hadley
Wickham.
- Comprehensive guide to using R programming for data science workflows.
- ๐ Introduction to Data Science: Data Analysis and Prediction
Algorithms with R by Rafael A.
Irizarry.
- Introduction to data science focused topics in R: visualisation, wrangling, prediction and workflow.
- ๐ก Base R Cheat Sheet by Mhairi
McNeill.
- Quick overview of basic R functionality.
- ๐ Tidynomicon - A Brief Introduction to R for People Who Count
From Zero by Greg Wilson.
- An introduction to R for Python users.
- ๐ Hands-on Programming with R by Garrett
Grolemund.
- A friendly introduction to the R language for non-programmers.
- ๐ R Cookbook: Proven Recipes for Data Analysis, Statistics, and
Graphics by James (JD) Long, Paul Teetor.
- Recipes and worked examples for performing core tasks in R.
- ๐ R package primer: a minimal tutorial by Karl
Broman.
- Overview of R packages development.
- ๐ R Packages by Hadley Wickham and Jennifer
Bryan.
- Comprehensive guide to how R packages work and how to write your own.
- ๐ Efficient R programming by Colin Gillespie and Robin
Lovelace.
- Comprehensive introduction to writing faster and more efficient R code.
- ๐ Advanced R by Hadley Wickham.
- Get deeper into R programming fundamentals, object oriented and functional programming concepts and a lot more. A must-read for experience R users!
โถ๏ธ RStudio Webinars by RStudio.- Recordings of past RStudio webinars covering a variety of R and data science content.
- ๐ An Introduction to R by W. N. Venables, D. M. Smith and the
R Core Team.
- Introduction to R written by the R-Core team.
- ๐ / ๐ Data science for economists by Grant
McDermott.
- Slides and code examples covering wide ranging introduction to data science in R.
- ๐ / ๐ Big Data in Economics by Grant
McDermott.
- Notes cover the use of R with shell, GitHub, web scraping, docker and cloud compute.
- ๐ Handling Strings with R by Gaston Sanchez and Chitra
Venkatesh.
- Detailed introduction to strings, manipulation, regex and text wrangling.
โถ๏ธ R Package Development by John Muschelli.- 6-part video series on the basics of R package development,
testing and building a
pkgdown
site.
- 6-part video series on the basics of R package development,
testing and building a
- ๐ Install Python and Anaconda by
Anaconda.
- The most commonly used package and environment manager for Python and how to install it.
- ๐ Free interactive introduction to Python and pandas by
?.
- Beginners introduction to Python, pandas and data analysis via an interactive course.
- ๐ Quick reference to Python in a single script and notebook by
Kevin Markham.
- Comprehensive reference guides for Python programming via notebooks and script examples.
- ๐ /
โถ๏ธ An Introduction to Python and Programming by Alexander Hess.- Python course for aspiring data scientists via notebooks, videos and exercises.
- ๐A Whirlwind Tour of Python by Jake
VanderPlas.
- A fast-paced introduction to essential features of the Python language for those already familiar with another language.
- ๐ Learn Python by Ron Reiter.
- Interactive online courses and tutorials for a wide range of Python topics.
- ๐ก Pandas Cheat Sheet by the Pandas development
team.
- 2-page quick reference to the most commonly used
pandas
functions.
- 2-page quick reference to the most commonly used
- ๐ Getting Started in pandas by the Pandas development team.
- Tutorials and quick start guides from the
pandas
development team.
- ๐ Python Data Science Handbook by Jake
VanderPlas.
- Online book with comprehensive coverage of IPython, numpy, pandas, matplotlib and machine learning with scikit-learn.
- ๐ Python for Everybody: Exploring Data Using Python 3 by
Charles R. Severance.
- Python ebook with a focus on programming fundamentals. Translations available in several languages.
- ๐ Python Packaging User Guide by the Python Packaging
Authority (PyPA).
- A collection of tutorials and references to help you distribute and install Python packages with modern tools.
- ๐ Learn Shell by Ron Reiter.
- A browser-based interactive Shell tutorial covering basics through to advanced topics.
- ๐ The Unix Shell by Software
Carpentry.
- Tutorials and examples of how to use the unix shell.
- ๐ Beginners/BashScripting by Ubuntu
Documentation.
- Introduction to using the shell for OS navigation and scripting.
โถ๏ธ How to Write a Shell Script using Bash Shell in Ubuntu by FS Tutorial- Short video showing how to write a first shell script using vim.
- ๐ /
โถ๏ธ The Missing Semester of Your CS Education by Anish Athalye, Jon Gjengset and Jose Javier Gonzalez Ortiz- Videos and notes on using shell and version control.
- ๐ The Art of the Command Line by Joshua
Levy
- Useful list of bash commands and explanations, all laid out on a single page!
- ๐ ExplainShell.com by Idan Kamara
- Handy utility - type in a shell command and get an explanation of what it does.
- ๐ RegexOne: Learn Regular Expressions with simple, interactive
exercises. by RegexOne
- Simple, browser based course with interactive exercises.
- ๐ Regular Expressions 101: Online Regular Expression Tester and
Debugger by Firas Dib
- Very handy tool to test regular expressions against test strings.
- ๐ก Data Science Cheat Sheet: Python Regular Expressions by
Dataquest
- PDF cheat-sheet for standard regular expression syntax.
- ๐กRegular Expressions Cheat Sheet by Dave
Child
- PDF cheat-sheet for standard regular expression syntax.
- ๐ Happy Git and GitHub for the useR by Jenny Bryan, the STAT
545 TAs and Jim Hester
- If you are an R user and new to git, this is currently the best place to start.
- ๐ An introduction to Git and how to use it with RStudio by
Franรงois Michonneau
- Conceptual overview of what git is and how to use it, with particular emphasis on Github and its use with RStudio.
- ๐ก Git Cheat Sheet by
GitHub
- A list of the main git shell commands.
- ๐ Pro Git by Scott Chacon and Ben
Straub
- Free ebook covering more advanced usage of git - good once youโre confident with the basics.
- ๐ Oh Shit Git! by Katie Sylor-Miller
- Light-hearted troubleshooting guide for when things inevitably go wrong!
- ๐ Step-by-step guide to contributing on GitHub by Kevin
Markham
- Detailed guide on how to contribute to open source software projects using git and Github.
- ๐ก PySpark Cheat Sheet by Kevin Schaich
- ๐ Mastering Spark with R by Javier Luraschi, Kevin Kuo and Edgar Ruiz
โถ๏ธ R & Spark: How to Analyze Data Using RStudioโs Sparklyr by Nathan Stephens- ๐A Gentle Introduction to Spark by DataBricks
- ๐ / ๐ The SQL Tutorial for Data Analysis by mode.com. Tutorials and interactive exercises teaching fundamentals of SQL.
- ๐ SQLBolt: Learn SQL with simple, interactive exercises.
- ๐ / ๐ SQLZoo: SQL Tutorial. Wikibook with interactive exercises.
- ๐ Intro to SQL: Querying and managing data by Khan Academy
- ๐ LearnSQLOnline by Ron Reiter
- ๐ An Introduction to Docker for R Users by Colin Fay
- ๐ R Docker tutorial by Jemma Stachelek
โถ๏ธ Docker and Python: making them play nicely and securely for Data Science and ML by Tania Allard at PyCon 2020
- ๐ R Markdown: The Definitive Guide by Yihui Xie, J. J. Allaire, Garrett Grolemund
- ๐ bookdown: Authoring Books and Technical Documents with R Markdown by Yihui Xie
- ๐ The Not So Short Introduction to LaTeX 2ฮต by Tobias Oetiker
- ๐ LaTeX for Beginners by UoE IS Services
- ๐ The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani and Jerome Friedman (2017)
- ๐ Computer Age Statistical Inference: Algorithms, Evidence and
Data Science by Bradley Efron and Trevor Hastie
(2017).
- A statistical approach to data science and machine learning.
- ๐Mathematics for Machine Learning by Marc Peter Deisenroth, A.
Aldo Faisal, Cheng Soon
Ong
- Covers the underpinning theory to many ML algorithms, a useful reference for practitioners.
- ๐ distill.pub by multiple contributors, edited by Shan Carter
and Chris Olah
- Online scientific journal publishing very high-quality, interactive articles on ML. On hiatus as of 2021.
- ๐ Mining of Massive Datasets by Jure Leskovec, Anand
Rajaraman, Jeff Ullman
- Book based on Stanford Computer Science course CS246: Mining Massive Datasets.
- ๐ Introduction to Statistical Learning by Gareth James,
Daniela Witten, Trevor Hastie and Robert
Tibshirani
- ISLR is still one of the most important books for getting started in practical ML.
- ๐ Interpretable Machine Learning: A Guide for Making Black Box
Models Explainable by Christoph Molnar
(2022)
- A highly practical introduction to IML, required reading if you are new to the topic.
- โ
Awesome: Machine Learning Interpretability by Patrick
Hall
- A big list of MLI resources with >2.5k github stars.
- ๐ Machine Learning Crash Course with TensorFlow APIs by
Google
- fast-paced, practical introduction to machine learning, with video lectures, real-world case studies, and hands-on practice exercises.
- ๐ Tidymodels Tutorials by
RStudio
- Variety to beginners guides to solving common ML tasks with Rโs tidymodels.
- ๐ Supervised Machine Learning Case Studies in R by Julia
Silge.
- Easy-to-follow in-browser beginnerโs guide to using Rโs tidymodels for practical ML.
- ๐ / ๐ฎ Introduction to machine learning with scikit-learn by
Justin Markham
- Bite size study videos and python notebooks by Justin Markhamโs Data School.
- ๐ scikit-learn User Guide by
scikit-learn
- sci-kit learnโs documentation are very thorough and a great standalone learning resource!
- ๐ Introduction to Machine Learning for Coders by Jeremy
Howard.
- 24 hours of videos and supporting notes from a Kaggle superstar.
- ๐ Software development skills for data scientists by Trey Causey
- ๐ Hidden Technical Debt in Machine Learning Systems
- ๐ How rOpenSci uses Code Review to Promote Reproducible Science by Noam Ross, Scott Chamberlain, Karthik Ram and Maรซlle Salmon
- ๐ Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research by Victoria Stodden and Sheila Miguez
- ๐ Journalism as a Professional Model for Data Science by Brian C. Keegan
- ๐ Cookiecutter Data Science by drivendata
- ๐ The Care and Feeding of Data Scientists: How to Build, Manage and Retain a Data Science Team by Michelangelo DโAgostino and Katie Malone
- ๐ง The Care and Feeding of Data Scientists: Becoming a Data Science Manager on Linear Digressions podcast by Katie Malone and Ben Jaffe
- ๐ Models for integrating data science teams within companies by Pardis Noorzad
- ๐ง Building Effective Data Science Teams with Kobi Abayomi, Gregory Berg, Elaine McVey, Jacqueline Nolis, Nasir Uddin and Julia Silge
- ๐ Building a data team at a mid-stage startup: a short story by Erik Bernhardsson
- ๐ Hiring a data scientist by Mikhail Popov, Wikimedia
- ๐ Agile Data Science with R: A workflow by Edwin Thoen
- ๐ Data Science and Agile (What works, and what doesnโt) by Eugene Yan
- ๐ Data Science Best Practices: Run your data science team like an engineering team by Leonard Austin
- ๐ Organizing machine learning projects: project management guidelines by Jeremy Jordan
- ๐ Ethics of Artificial Intelligence and Robotics by Stanford Encyclopedia of Philosophy
- ๐ The Responsible Machine Learning Principles: A practical framework to develop AI responsibly by The Institute for Ethical AI & Machine Learning
- ๐ A Code of Ethics for Data Science by DJ Patil
- ๐ The Ethical Data Scientist by Cathy Oโ Neil
- ๐ An ethics checklist for data scientists by drivendata
- ๐ Fairness and machine learning: Limitations and Opportunities by Solon Barocas, Moritz Hardt, Arvind Narayanan
- ๐ Practical Data Ethics by fast.ai
- ๐ MLOps: Continuous delivery and automation pipelines in machine learning by Google Cloud
- ๐ Using GitHub Actions for MLOps & Data Science by Hamel Husain, The Github Blog
- ๐ Continuous Delivery for Machine Learning: Automating the end-to-end lifecycle of Machine Learning applications by Danilo Sato, Arif Wider and Christoph Windheuser
- ๐ Monitoring Machine Learning Models in Production: A Comprehensive Guide by Christopher Samiullah
- ๐ What are Azure Machine Learning pipelines? by Microsoft
- ๐ Getting started with Kubeflow Pipelines by Amy Unruh, Google Cloud
- ๐ Continuous Machine Learning (CML) is CI/CD for Machine Learning Projects by DVC.org
- ๐ Data Science Workflows by David Neuzerling
- ๐ Monitoring Machine Learning Models in Production A Comprehensive Guide by Christopher Samiullah
- ๐ The problem with AI developer tools for enterprises (and what IKEA has to do with it) by Clemens Mewald
- ๐ 5 Reasons Organizations Shouldnโt Build Their Own AI Platforms by dataiku
- ๐ Udacity Git Commit Message Style Guide by Udacity
- ๐ The tidyverse style guide by Hadley Wickham
- ๐The Google R Style Guide by Google
- ๐ The Google Python Style Guide by Google
- ๐ PEP 8 โ Style Guide for Python Code by Guido van Rossum, Barry Warsaw, Nick Coghlan
- ๐ฎ / ๐ Learn Shiny by RStudio
- ๐ A gRadual intRoduction to Shiny by Ted Laderas and Jessica Minnier
- ๐ Interactive web-based data visualization with R, plotly, and shiny by Carson Sievert
- ๐ Dashboards by Yihui Xie, J. J. Allaire, Garrett Grolemund. Chapter 5 from โR Markdown: The Definitive Guideโ.
- ๐ Leaflet for R by RStudio
- ๐ Dash User Guide by Plotly
- ๐ Getting Started with Streamlit by streamlit
- ๐ Fundamentals of Data Visualization by Claus O. Wilke
- ๐ ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham
- ๐ 3D Mapping and Visualization with R and Rayshader by Tyler Morgan-Wall
- ๐ Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos
- ๐ 11 Classical Time Series Forecasting Methods in Python (Cheat Sheet) by Jason Brownlee
- ๐ GAMs in R by Noam Ross Interactive course introducing Generalised Additive Models (GAMs).
- ๐ Resources for Learning About and Using GAMs in R by Noam Ross
- ๐ Statistical Inference via Data Science: A Modern Dive into R and the tidyverse by Chester Ismay and Albert Y. Kim
- ๐ Think Stats Exploratory Data Analysis in Python by Allen B. Downey
- ๐ Learning statistics with R: A tutorial for psychology students and other beginners Danielle Navarro
- ๐ Probabilistic Programming & Bayesian Methods for Hackers by Cameron Davidson-Pilon
- ๐ From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science by Norm Matloff
- ๐ Theory of Statistics by James E. Gentle
- ๐ Core Statistics by Simon Wood
- ๐ Geocomputation with R by Robin Lovelace, Jakub Nowosad, Jannes Muenchow
- ๐ Spatial Data Science by Edzer Pebesma and Roger Bivand
- ๐ Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny by Paula Moraga
- ๐ฅ PyData Meetup Groups
- ๐ฅ PyLadies by PyLadies
- ๐ฅ Directory of R User Groups by Jumping Rivers
- ๐ฅ Complete list of R-Ladies groups by R-Ladies Global.
- ๐ฅ R for Data Science Online Learning
Community
- The R4DS Online Learning Community is a community of R learners at all skill levels working together to improve their skills.
- ๐ฅ Tidy Tuesday
- A weekly podcast and community activity brought to you by the R4DS Online Learning Community.
- ๐ฅSatRdays SatRdays +R-focused conferences that are held on Saturdays.
- ๐ Text Mining with R: A Tidy Approach by Julia Silge and David Robinson
- ๐ Advanced NLP with SpaCy by Ines Montani
- ๐ 100 Must read papers in NLP by Masato Hagiwara
- ๐ Stanford CS 124: From Languages to Information by Dan Jurafsky
- ๐ Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper.
- ๐ A Code-First Intro to Natural Language Processing by fast.ai
- The course is taught in Python with Jupyter Notebooks, using libraries such as sklearn, nltk, pytorch, and fastai.
- ๐ Speech and Language Processing by Dan Jurafsky and James H. Martin
โถ๏ธ BERT Research Series by Chris McCormick