Building up the latest skills to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark,Kafka, Spark Streaming, and Kafka Streaming.
- Understand the components of data streaming systems. Ingest data in real-time using Apache Kafka and Spark and run analysis.
- Use the Faust Stream Processing Python library to build a real-time stream-based application. Compile real-time data and run live analytics, as well as draw insights from reports generated by the streaming console.
- Learn about the Kafka ecosystem, and the types of problems each solution is designed to solve. Use the Confluent Kafka Python library for simple topic management, production, and consumption.
- Explain the components of Spark Streaming (architecture and API), integrate Apache Spark Structured Streaming and Apache Kafka, manipulate data using Spark, and read DataFrames in the Spark Streaming Console.
Demonstrate knowledge of the tools data streaming tools including Kafka Consumers, Producers and Topics; Kafka Connect Sources and Sinks, Kafka REST Proxy for producing data over REST, Data Schemas with JSON and Apache Avro/Schema Registry, Stream Processing with the Faust Python Library, and Stream Processing with KSQL.
- Introduction to Stream Processing
- Apache Kafka
- Data Schemas and Apache Avro
- Kafka Connect and REST Proxy
- Stream Processing Fundamentals
- Stream Processing with Faust
- KSQL
- Optimize Chicago Bus and Train Availability Using Kafka
Grow expertise in streaming data systems and build a continuous application with Structured Streaming, consume and process data from Apache Kafka with Spark Structured Streaming, create a DataFrame as an aggregation of source DataFrames, sink a composite DataFrame to Kafka, and visually inspect a data sink for accuracy.
- Streaming DataFrames
- Joins and JSON
- Redis, Base64 and JSON
- Evaluate Human Balance with Spark Streaming