Skip to content

Comprehensive bibliography for building strong CS foundations and data engineering expertise

Notifications You must be signed in to change notification settings

pippo995/data-engineer-bookshelf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Data Engineering Bookshelf

A curated collection of books and resources for learning computer science fundamentals and specializing in data engineering.

Introduction

This repository represents my personal journey transitioning into data engineering. Coming from a general programming background, I realized that becoming a proficient data engineer requires both strong computer science fundamentals and specialized knowledge of data systems.

The books in this collection have been carefully selected based on recommendations from experienced professionals, online courses, and my own research. Some I've completed, others I'm currently reading, and a few are in my queue. I've organized them in a logical progression that builds knowledge from foundational concepts to specialized data engineering skills.

My criteria for including a book are:

  1. Relevance to modern data engineering practices
  2. Timelessness of the concepts covered
  3. Depth and clarity of explanations
  4. Practical applicability of the knowledge

I'm sharing this resource to help others on a similar path and to create a reference for my own learning journey. Feel free to suggest additions or share your experiences with these materials!

Table of Contents

Computer Science Fundamentals

Author: Brian Ward
Latest Edition: 3rd Edition (2021)
A comprehensive guide to understanding Linux systems from the ground up, covering everything from boot processes to networking and system administration.

Authors: Remzi H. Arpaci-Dusseau, Andrea C. Arpaci-Dusseau
Latest Edition: 1.00 (2018)
An accessible introduction to operating systems, breaking down complex concepts into three fundamental pieces: virtualization, concurrency, and persistence.

Author: Charles Petzold
Latest Edition: 2nd Edition (2022)
A fascinating exploration of how computers work at the most fundamental level, explaining the connections between human language, electrical engineering, and modern computing.

Author: Robert C. Martin
Latest Edition: 1st Edition (2008)
A definitive guide to writing readable, maintainable code, filled with practical examples and principles that help developers produce better software.

Author: Robert C. Martin
Latest Edition: 1st Edition (2017)
A comprehensive guide to building software systems that are easier to understand, maintain, and extend, using proven architectural principles.

Author: Aditya Bhargava
Latest Edition: 1st Edition (2016)
A beginner-friendly, illustrated guide to algorithms that makes complex concepts accessible through visual explanations and simple examples.

Author: Jay Wengrow
Latest Edition: 2nd Edition (2020)
A practical introduction to data structures and algorithms that focuses on real-world applications and performance considerations.

Authors: Andrew Hunt, David Thomas
Latest Edition: 20th Anniversary Edition (2019)
A collection of practical advice and best practices for software development that helps programmers become more effective and productive.

Authors: Harold Abelson, Gerald Jay Sussman, Julie Sussman
Latest Edition: 2nd Edition (1996)
A classic text that uses Scheme to teach fundamental principles of programming and abstraction, emphasizing the importance of computational processes and their roles in problem-solving.

Author: Martin Fowler
Latest Edition: 2nd Edition (2018)
A comprehensive guide to refactoring techniques that help developers improve code quality and maintainability without changing external behavior.

Authors: Randal E. Bryant, David R. O'Hallaron
Latest Edition: 3rd Edition (2015)
An in-depth look at how computer systems execute programs, store information, and communicate, providing essential knowledge for writing efficient, high-performance code.

Authors: Brian W. Kernighan, Dennis M. Ritchie
Latest Edition: 2nd Edition (1988)
The definitive guide to C programming written by the language's creators, covering syntax, semantics, and practical programming techniques.

Authors: Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides
Latest Edition: 1st Edition (1994)
A catalog of design patterns that provide reusable solutions to common software design problems, essential for creating flexible and maintainable object-oriented systems.

Data Engineering Specific

Authors: Joe Reis, Matt Housley
Latest Edition: 1st Edition (2022)
A comprehensive guide to the fundamentals of data engineering, covering the entire data lifecycle from ingestion to storage, processing, and serving data products.

Author: Martin Kleppmann
Latest Edition: 1st Edition (2017)
An in-depth exploration of the principles, challenges, and approaches for designing systems that handle large volumes of data, emphasizing reliability, scalability, and maintainability.

Author: Joe Reis
Latest Edition: 1st Edition (2023)
A practical guide to solving common data engineering problems using proven design patterns, covering data modeling, pipelines, and architecture solutions.

Author: Paul Crickard
Latest Edition: 1st Edition (2020)
A hands-on guide to building effective data pipelines with Python, covering tools like Apache Airflow, SQL databases, and cloud platforms.

Authors: Ralph Kimball, Margy Ross
Latest Edition: 3rd Edition (2013)
The definitive guide to dimensional modeling for data warehouses, essential for designing scalable and usable data structures for analytics.

Authors: Nathan Marz, James Warren
Latest Edition: 1st Edition (2015)
Introduces the Lambda Architecture as a scalable approach to handling big data, providing practical techniques for building systems that combine batch and real-time processing.

Project Management

Author: Frederick P. Brooks Jr.
Latest Edition: Anniversary Edition (1995)
A classic collection of essays on software project management that explores why adding more people to a late project makes it later and provides timeless insights into managing complex software projects.

Authors: Nicole Forsgren PhD, Jez Humble, Gene Kim
Latest Edition: 1st Edition (2018)
Research-backed insights into how high-performing technology organizations deliver software quickly and reliably - essential knowledge for modern data engineering teams working in agile environments.

About

Comprehensive bibliography for building strong CS foundations and data engineering expertise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors