In this chapter, we reviewed an approach to developing data engineering pipelines by identifying a limited-scope project, and then whiteboarding a high-level architecture diagram. We looked at how we could have a workshop, in conjunction with relevant stakeholders in the organization, to discuss requirements and plan the initial architecture.
- Spotify blog providing an example of a data engineering pipeline: https://engineering.atspotify.com/2020/02/18/spotify-unwrapped-how-we-brought-you-a-decade-of-data/
In the hands-on activity section of this chapter, we read through some fictional notes of a meeting to discuss a new project that had specific data requirements. As we read through the notes, we sketched out a high-level whiteboard architecture showing data consumers, data ingestion sources, and transformations.
- Link to diagrams.net - an online architecture design tool: https://www.diagrams.net/.
NOTE: The files linked to below can be downloaded from here (in .drawio format) and then opened in diagrams.net and modified. To download the source files, click on the link, and then right-click the Raw button, and select Save link as. This will let you download the XML draw.io file which you can then open with diagrams.net.
-
Generic Data Architecture Whiteboard Template (drawio format): Data-Engineering-Whiteboard-Template.drawio
-
Completed Data Architecture Whiteboard Diagram (drawio format): Data-Engineering-Completed-Whiteboard.drawio
-
Completed Data Architecture Whiteboard Notes (drawio format): Data-Engineering-Whiteboard-Completed-Notes.drawio