Skip to content

Learning Resources

mike edited this page Aug 28, 2023 · 11 revisions

If you're new to the field, there are some excellent resources available to help you get started! Learning data engineering/science, software engineering, and general technical skills is an enjoyable experience, made even more accessible with the assistance of language models like ChatGPT.

I've taught both beginners and intermediate Python students, and below are my favorite resources for learning. It's important to note that mastering Python will take time, so I recommend pairing your Python studies with a strong foundation in general computer science. This will better prepare you for understanding codebases like orangutan-stem, which demands a comprehensive grasp of various software engineering principles. These principles include Object-Oriented Programming (OOP), Functional Programming, operating systems, source control, Docker, and command line skills. Additionally, becoming familiar with Python will enhance your understanding of frameworks like Airflow. Practicing with SQL tools like BigQuery and PostgreSQL will also prove beneficial.

This learning track will equip you with the skills you need to understand the orangutan-stem codebase, setting you on a path to mastering a broad range of data-related software engineering topics. I hope you are able to spin your own portfolio or practice projects off of this codebase!

Home

Ia. Python, Computer Science, and Software Engineering Foundations

  1. Python Beginner's Guide
  2. Real Python
  3. Microsoft Trainings
  4. Learn Python (DataCamp)
  5. How Computers Work: Crash Course Computer Science
  6. Data Structures and Algorithms in Python
  7. Operating Systems: Linux Beginner's Course
  8. Command Line
  9. API's: API's with Python
  10. Databases: Postgresql
  11. Networks: Network Engineering
  12. Testing: PyTest
  13. System Administration: Linux System Administration
  14. Servers: EC2

II. SQL

  1. Khan Academy
  2. W3 Schools
  3. Guru99 PostgreSQL training

III. Linux

  1. Linux Journey
  2. Linux Upskill Challenge Github

IV. Data Warehouses

  1. Kimball
  2. Pluralsight

V. Data Lakes & Delta Lakes

  1. Databricks
  2. AWS
  3. Talend
  4. Delta Lake

VI. Data Orchestration Frameworks

  1. Apache Airflow with Marc Lamberti
  2. Apache Airflow Documentation
  3. Mage

VII. Data Processing Frameworks (Pythonic)

  1. Pandas
  2. PySpark
  3. Pola.rs

VIII. Git & Source Control

  1. GitHub

IX. General STEM Skills

  1. schoolhouse.world

X. Data Influencers to Follow

  1. Zach Wilson (Data)
  2. Sarah Floris (Machine Learning, Software Engineering)
  3. Chip Huyen (Machine Learning, Data)
  4. Marc Lamberti (Data Engineering-Airflow)
  5. Ben Rogojan (Data Engineering, Data Science, and Data Analysis)

Home

Clone this wiki locally