Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
hisah authored Oct 27, 2019
1 parent 6cac07c commit dadbcea
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Repository for all files and codes related to the project: A Multilevel Streamin

There has been a considerable interest in developing systems for processing continuous data streams with the increasing need for real-time analytics for decision support in business, healthcare, manufacturing, security, and internet of things. Some of the data processed through streaming data processing systems need further processing for which most systems currently store the data on the disk and re-load the data in memory for the next level of processing. Storing re-loading of large streaming data incurs compute and storage overhead. The goal of this project is to design and implement a multi-level architecture that can support high speed real time streaming data processing and complex machine learning analytics.
## Architecture
Extracting meaningful and timely insights from unbounded data is very challenging. Currently there are many open-source and proprietary systems for data stream processing. The large number of available systems is good but poses a major challenge in terms of selecting the right components or processing framework for different use cases. Understanding the required capabilities of streaming architectures is vital in making the right design or usage choice. As first step in achieving the objectives of the this project, we conducted a systematic literature review, propose a taxonomy and architecture, perform a comparative study of distributed data stream processing/analytics frameworks, and conducted a critical review of representative open source (Storm, Spark Streaming, Structured Streaming, Flink, Kafka Streams, KSQL) and commercial (IBM Streams) distributed data stream and graph processing frameworks. This study identified open problems (research opportunities) and can serve as a guide for organizations and individuals planning to implement a real-time data stream processing and analytics framework. The outcome of our review has been published in the IEEE Access entitled "A Survey of Distributed Data Stream Processing Frameworks". URL: https://ieeexplore.ieee.org/document/8864052
Extracting meaningful and timely insights from unbounded data is very challenging. Currently there are many open-source and proprietary systems for data stream processing. The large number of available systems is good but poses a major challenge in terms of selecting the right components or processing framework for different use cases. Understanding the required capabilities of streaming architectures is vital in making the right design or usage choice. As first step in achieving the objectives of the this project, we conducted a systematic literature review, propose a taxonomy and architecture, perform a comparative study of distributed data stream processing/analytics frameworks, and conducted a critical review of representative open source (Storm, Spark Streaming, Structured Streaming, Flink, Kafka Streams, KSQL) and commercial (IBM Streams) distributed data stream and graph processing frameworks. This study identified open problems (research opportunities) and can serve as a guide for organizations and individuals planning to implement a real-time data stream processing and analytics framework https://raw.githubusercontent.com/hisah/multi-level_streaming_analytics/master/framework.png. The outcome of our review has been published in the IEEE Access entitled "A Survey of Distributed Data Stream Processing Frameworks". URL: https://ieeexplore.ieee.org/document/8864052
## Data stream ingestion
An essential part of building a data-driven organization is the ability to handle and process continuous streams of data to discover actionable insights. The explosive growth of interconnected devices and the social Web has led to a large volume of data being generated on a continuous basis. Streaming data sources such as stock quotes, credit card transactions, trending news, traffic conditions, time-sensitive patient’s data are not only very common but can rapidly depreciate if not processed quickly. The ever-increasing volume and highly irregular nature of data rates pose new challenges to data stream processing systems. One such challenging but important task is how to accurately ingest and integrate data streams from various sources and locations into an analytics platform. These challenges demand new strategies and systems that can offer the desired degree of scalability and robustness in handling failures. This project investigates the fundamental requirements and the state of the art of existing data stream ingestion systems, propose a scalable and fault-tolerant data stream ingestion and integration framework that can serve as a reusable component across many feeds of structured and unstructured input data in a given platform, and demonstrate the utility of the framework in a real-world data stream processing case study that integrates Apache NiFi and Kafka for processing high velocity news articles from across the globe. The study also identifies best practices and gaps for future research in developing large-scale data stream processing infrastructure. The outcome of this study was presented during the 2018 IEEE Bigdata conference at Seattle, WA, USA. Paper: A Scalable and Robust Framework for Data Stream Ingestion. URL: https://ieeexplore.ieee.org/abstract/document/8622360
## Data stream analytics
Expand Down

0 comments on commit dadbcea

Please sign in to comment.