Skip to content

Commit dadbcea

Browse files
authored
Update README.md
1 parent 6cac07c commit dadbcea

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Repository for all files and codes related to the project: A Multilevel Streamin
33

44
There has been a considerable interest in developing systems for processing continuous data streams with the increasing need for real-time analytics for decision support in business, healthcare, manufacturing, security, and internet of things. Some of the data processed through streaming data processing systems need further processing for which most systems currently store the data on the disk and re-load the data in memory for the next level of processing. Storing re-loading of large streaming data incurs compute and storage overhead. The goal of this project is to design and implement a multi-level architecture that can support high speed real time streaming data processing and complex machine learning analytics.
55
## Architecture
6-
Extracting meaningful and timely insights from unbounded data is very challenging. Currently there are many open-source and proprietary systems for data stream processing. The large number of available systems is good but poses a major challenge in terms of selecting the right components or processing framework for different use cases. Understanding the required capabilities of streaming architectures is vital in making the right design or usage choice. As first step in achieving the objectives of the this project, we conducted a systematic literature review, propose a taxonomy and architecture, perform a comparative study of distributed data stream processing/analytics frameworks, and conducted a critical review of representative open source (Storm, Spark Streaming, Structured Streaming, Flink, Kafka Streams, KSQL) and commercial (IBM Streams) distributed data stream and graph processing frameworks. This study identified open problems (research opportunities) and can serve as a guide for organizations and individuals planning to implement a real-time data stream processing and analytics framework. The outcome of our review has been published in the IEEE Access entitled "A Survey of Distributed Data Stream Processing Frameworks". URL: https://ieeexplore.ieee.org/document/8864052
6+
Extracting meaningful and timely insights from unbounded data is very challenging. Currently there are many open-source and proprietary systems for data stream processing. The large number of available systems is good but poses a major challenge in terms of selecting the right components or processing framework for different use cases. Understanding the required capabilities of streaming architectures is vital in making the right design or usage choice. As first step in achieving the objectives of the this project, we conducted a systematic literature review, propose a taxonomy and architecture, perform a comparative study of distributed data stream processing/analytics frameworks, and conducted a critical review of representative open source (Storm, Spark Streaming, Structured Streaming, Flink, Kafka Streams, KSQL) and commercial (IBM Streams) distributed data stream and graph processing frameworks. This study identified open problems (research opportunities) and can serve as a guide for organizations and individuals planning to implement a real-time data stream processing and analytics framework https://raw.githubusercontent.com/hisah/multi-level_streaming_analytics/master/framework.png. The outcome of our review has been published in the IEEE Access entitled "A Survey of Distributed Data Stream Processing Frameworks". URL: https://ieeexplore.ieee.org/document/8864052
77
## Data stream ingestion
88
An essential part of building a data-driven organization is the ability to handle and process continuous streams of data to discover actionable insights. The explosive growth of interconnected devices and the social Web has led to a large volume of data being generated on a continuous basis. Streaming data sources such as stock quotes, credit card transactions, trending news, traffic conditions, time-sensitive patient’s data are not only very common but can rapidly depreciate if not processed quickly. The ever-increasing volume and highly irregular nature of data rates pose new challenges to data stream processing systems. One such challenging but important task is how to accurately ingest and integrate data streams from various sources and locations into an analytics platform. These challenges demand new strategies and systems that can offer the desired degree of scalability and robustness in handling failures. This project investigates the fundamental requirements and the state of the art of existing data stream ingestion systems, propose a scalable and fault-tolerant data stream ingestion and integration framework that can serve as a reusable component across many feeds of structured and unstructured input data in a given platform, and demonstrate the utility of the framework in a real-world data stream processing case study that integrates Apache NiFi and Kafka for processing high velocity news articles from across the globe. The study also identifies best practices and gaps for future research in developing large-scale data stream processing infrastructure. The outcome of this study was presented during the 2018 IEEE Bigdata conference at Seattle, WA, USA. Paper: A Scalable and Robust Framework for Data Stream Ingestion. URL: https://ieeexplore.ieee.org/abstract/document/8622360
99
## Data stream analytics

0 commit comments

Comments
 (0)