Skip to content

KentHsu/Udacity-Data-Streaming-Nanodegree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

Udacity - Data Streaming Nanodegree Program

Building up the latest skills to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark,Kafka, Spark Streaming, and Kafka Streaming.

  • Understand the components of data streaming systems. Ingest data in real-time using Apache Kafka and Spark and run analysis.
  • Use the Faust Stream Processing Python library to build a real-time stream-based application. Compile real-time data and run live analytics, as well as draw insights from reports generated by the streaming console.
  • Learn about the Kafka ecosystem, and the types of problems each solution is designed to solve. Use the Confluent Kafka Python library for simple topic management, production, and consumption.
  • Explain the components of Spark Streaming (architecture and API), integrate Apache Spark Structured Streaming and Apache Kafka, manipulate data using Spark, and read DataFrames in the Spark Streaming Console.

Course 1 - Data Ingestion with Apache Kafka

Demonstrate knowledge of the tools data streaming tools including Kafka Consumers, Producers and Topics; Kafka Connect Sources and Sinks, Kafka REST Proxy for producing data over REST, Data Schemas with JSON and Apache Avro/Schema Registry, Stream Processing with the Faust Python Library, and Stream Processing with KSQL.

Contents

  • Introduction to Stream Processing
  • Apache Kafka
  • Data Schemas and Apache Avro
  • Kafka Connect and REST Proxy
  • Stream Processing Fundamentals
  • Stream Processing with Faust
  • KSQL

Projects

  • Optimize Chicago Bus and Train Availability Using Kafka

Course 2 - Streaming API Development and Documentation

Grow expertise in streaming data systems and build a continuous application with Structured Streaming, consume and process data from Apache Kafka with Spark Structured Streaming, create a DataFrame as an aggregation of source DataFrames, sink a composite DataFrame to Kafka, and visually inspect a data sink for accuracy.

Contents

  • Streaming DataFrames
  • Joins and JSON
  • Redis, Base64 and JSON

Project

  • Evaluate Human Balance with Spark Streaming