Skip to content

Commit

Permalink
Merge pull request Tikam02#173 from KimtVak8143/Data-streaming-doc
Browse files Browse the repository at this point in the history
Data streaming documentation update
  • Loading branch information
Tikam02 authored Oct 10, 2022
2 parents bb1fc81 + 7eec7a5 commit c1a4929
Show file tree
Hide file tree
Showing 8 changed files with 94 additions and 14 deletions.
17 changes: 17 additions & 0 deletions Data-streaming/amazon-kinesis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## Amazon Kinesis

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. With Amazon Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. Amazon Kinesis enables you to process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.

Benefits:
- Real-time: Amazon Kinesis enables you to ingest, buffer, and process streaming data in real-time, so you can derive insights in seconds or minutes instead of hours or days.
- Fully Managed: Amazon Kinesis is fully managed and runs your streaming applications without requiring you to manage any infrastructure.
- Scalable: Amazon Kinesis can handle any amount of streaming data and process data from hundreds of thousands of sources with very low latencies

Capabilities:
- Amazon Kinesis Video Streams makes it easy to securely stream video from connected devices to AWS for analytics, machine learning (ML), and other processing.
- Amazon Kinesis Data Streams is a scalable and durable real-time data streaming service that can continuously capture gigabytes of data per second from hundreds of thousands of sources.
- Amazon Kinesis Data Firehose is the easiest way to capture, transform, and load data streams into AWS data stores for near real-time analytics with existing business intelligence tools.
- Amazon Kinesis Data Analytics is the easiest way to process data streams in real time with SQL or Apache Flink without having to learn new programming languages or processing frameworks.


For more info: [https://aws.amazon.com/kinesis/](https://aws.amazon.com/kinesis/)
19 changes: 19 additions & 0 deletions Data-streaming/apache-kafka.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
## Apache Kafka

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Core Capabilities:
- Deliver messages at network limited throughput using a cluster of machines with latencies as low as 2ms.
- Scale production clusters up to a thousand brokers, trillions of messages per day, petabytes of data, hundreds of thousands of partitions. Elastically expand and contract storage and processing.
- Store streams of data safely in a distributed, durable, fault-tolerant cluster.
- Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions.


Ecosystem:
- Process streams of events with joins, aggregations, filters, transformations, and more, using event-time and exactly-once processing.
- Kafka’s out-of-the-box Connect interface integrates with hundreds of event sources and event sinks including Postgres, JMS, Elasticsearch, AWS S3, and more.
- Read, write, and process streams of events in a vast array of programming languages.
- Large ecosystem of open source tools: Leverage a vast array of community-driven tooling.


For more info: [https://kafka.apache.org/](https://kafka.apache.org/)
7 changes: 0 additions & 7 deletions Data-streaming/apache-kafka/readme.md

This file was deleted.

11 changes: 11 additions & 0 deletions Data-streaming/apache-storm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Apache Storm

Apache Storm is a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Apache Storm is simple, can be used with any programming language, and is a lot of fun to use!

Apache Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Apache Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

Apache Storm integrates with the queueing and database technologies you already use. An Apache Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed.

Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.

For more info: [https://storm.apache.org/](https://storm.apache.org/)
11 changes: 11 additions & 0 deletions Data-streaming/azure-stream-analytics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Azure Stream Analytics

Azure Stream Analytics is a fully managed stream processing engine that is designed to analyze and process large volumes of streaming data with sub-millisecond latencies. Patterns and relationships can be identified in data that originates from a variety of input sources including applications, devices, sensors, clickstreams, and social media feeds. These patterns can be used to trigger actions and initiate workflows such as creating alerts, feeding information to a reporting tool, or storing transformed data for later use. Stream Analytics is also available on the Azure IoT Edge runtime, enabling to process data directly on IoT devices.

Features:
- End-to-end analytics pipeline that is production-ready in minutes with familiar SQL syntax and extensible with JavaScript and C# custom code
- Rapid scalability with elastic capacity to build robust streaming data pipelines and analyze millions of events at subsecond latencies
- Hybrid architectures for stream processing with the ability to run the same queries in the cloud and on the edge
- Enterprise-grade reliability with built-in recovery and built-in machine learning capabilities for advanced scenarios

For more info: [https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction](https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction)
16 changes: 16 additions & 0 deletions Data-streaming/gcloud-dataflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## Google Cloud Dataflow

Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem.

Benefits:
- Streaming data analytics with speed: Dataflow enables fast, simplified streaming data pipeline development with lower data latency.
- Simplify operations and management: Allow teams to focus on programming instead of managing server clusters as Dataflow’s serverless approach removes operational overhead from data engineering workloads.
- Reduce total cost of ownership: Resource autoscaling paired with cost-optimized batch processing capabilities means Dataflow offers virtually limitless capacity to manage your seasonal and spiky workloads without overspending.

Features:
- Autoscaling of resources and dynamic work rebalancing: Minimize pipeline latency, maximize resource utilization, and reduce processing cost per data record with data-aware resource autoscaling. Data inputs are partitioned automatically and constantly rebalanced to even out worker resource utilization and reduce the effect of “hot keys” on pipeline performance.
- Flexible scheduling and pricing for batch processing: For processing with flexibility in job scheduling time, such as overnight jobs, flexible resource scheduling (FlexRS) offers a lower price for batch processing. These flexible jobs are placed into a queue with a guarantee that they will be retrieved for execution within a six-hour window.
- Ready-to-use real-time AI patterns: Enabled through ready-to-use patterns, Dataflow’s real-time AI capabilities allow for real-time reactions with near-human intelligence to large torrents of events. Customers can build intelligent solutions ranging from predictive analytics and anomaly detection to real-time personalization and other advanced analytics use cases.


For more info: [https://cloud.google.com/dataflow](https://cloud.google.com/dataflow)
11 changes: 11 additions & 0 deletions Data-streaming/ibm-stream-analytics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## IBM Stream Analytics

IBM Streaming Analytics for IBM Cloud evaluates a broad range of streaming data — unstructured text, video, audio, geospatial and sensor — helping organizations spot opportunities and risks and make decisions in real time.

Features:
- Development support: Rich Eclipse-based, visual IDE lets solution architects visually build applications or use familiar programming languages like Java™, Scala or Python.
- Rich data connections: Data engineers can connect with virtually any data source — whether structured, unstructured or streaming — and integrate with Hadoop, Spark and other data infrastructures.
- Analysis and visualization: Built-in domain analytics — like machine learning, natural language, spatial-temporal, text, acoustics and more — to create adaptive streams applications.


For more info: [https://www.ibm.com/in-en/cloud/streaming-analytics](https://www.ibm.com/in-en/cloud/streaming-analytics)
16 changes: 9 additions & 7 deletions Data-streaming/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@

The term "streaming" is used to describe continuous, never-ending data streams with no beginning or end, that provide a constant feed of data that can be utilized/acted upon without needing to be downloaded first.


****
Following are some of the renowned services used for Data Streaming:

- Amazon Kinesis
- Apache Kafka
- Apache Storm
- Spark streaming
- IBM Stream analytics
- Azure Stream Analytics
- Google Cloud DataFlow
- [Amazon Kinesis](amazon-kinesis.md)
- [Apache Kafka](apache-kafka.md)
- [Apache Storm](apache-storm.md)
- [Spark streaming](spark-streaming.md)
- [IBM Stream analytics](ibm-stream-analytics.md)
- [Azure Stream Analytics](azure-stream-analytics.md)
- [Google Cloud DataFlow](gcloud-dataflow.md)

0 comments on commit c1a4929

Please sign in to comment.