Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
hisah authored Oct 27, 2019
1 parent af863c7 commit 6cac07c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Extracting meaningful and timely insights from unbounded data is very challengin
## Data stream ingestion
An essential part of building a data-driven organization is the ability to handle and process continuous streams of data to discover actionable insights. The explosive growth of interconnected devices and the social Web has led to a large volume of data being generated on a continuous basis. Streaming data sources such as stock quotes, credit card transactions, trending news, traffic conditions, time-sensitive patient’s data are not only very common but can rapidly depreciate if not processed quickly. The ever-increasing volume and highly irregular nature of data rates pose new challenges to data stream processing systems. One such challenging but important task is how to accurately ingest and integrate data streams from various sources and locations into an analytics platform. These challenges demand new strategies and systems that can offer the desired degree of scalability and robustness in handling failures. This project investigates the fundamental requirements and the state of the art of existing data stream ingestion systems, propose a scalable and fault-tolerant data stream ingestion and integration framework that can serve as a reusable component across many feeds of structured and unstructured input data in a given platform, and demonstrate the utility of the framework in a real-world data stream processing case study that integrates Apache NiFi and Kafka for processing high velocity news articles from across the globe. The study also identifies best practices and gaps for future research in developing large-scale data stream processing infrastructure. The outcome of this study was presented during the 2018 IEEE Bigdata conference at Seattle, WA, USA. Paper: A Scalable and Robust Framework for Data Stream Ingestion. URL: https://ieeexplore.ieee.org/abstract/document/8622360
## Data stream analytics
The rapid growth of data in velocity, volume, value, variety, and veracity has enabled exciting new opportunities and presented big challenges for businesses of all types. Recently, there has been considerable interest in developing systems for processing continuous data streams with the increasing need for real-time analytics for decision support in the business, healthcare, manufacturing, and security. The analytics of streaming data usually relies on the output of offline analytics on static or archived data. However, businesses and organizations like our industry partner Gnowit, strive to provide their customers with real time market information and continuously look for a unified analytics framework that can integrate both streaming and offline analytics in a seamless fashion to extract knowledge from large volumes of hybrid streaming data. Here, as part of this project, we present our study on designing a multilevel streaming text data analytics framework by comparing leading edge scalable open-source, distributed, and in-memory technologies. We demonstrate the functionality of the framework for a use case of multilevel text analytics using deep learning for language understanding and sentiment analysis including data indexing and query processing. Our framework combines Spark streaming for real time text processing, the Long Short Term Memory (LSTM) deep learning model for higher level sentiment analysis, and other tools for SQL-based analytical processing to provide a scalable solution for multilevel streaming text analytics. The outcome of this study was presented during the 2019 IEEE COMPSAC conference at Milwaukee, Wisconsin, USA. Paper: A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning. URL:https://ieeexplore.ieee.org/abstract/document/8754149
The rapid growth of data in velocity, volume, value, variety, and veracity has enabled exciting new opportunities and presented big challenges for businesses of all types. Recently, there has been considerable interest in developing systems for processing continuous data streams with the increasing need for real-time analytics for decision support in the business, healthcare, manufacturing, and security. The analytics of streaming data usually relies on the output of offline analytics on static or archived data. However, businesses and organizations like our industry partner Gnowit, strive to provide their customers with real time market information and continuously look for a unified analytics framework that can integrate both streaming and offline analytics in a seamless fashion to extract knowledge from large volumes of hybrid streaming data. Here, as part of this project, we present our study on designing a multilevel streaming text data analytics framework by comparing leading edge scalable open-source, distributed, and in-memory technologies. We demonstrate the functionality of the framework for a use case of multilevel text analytics using deep learning for language understanding and sentiment analysis including data indexing and query processing. Our framework combines Spark streaming for real time text processing, the Long Short Term Memory (LSTM) deep learning model for higher level sentiment analysis, and other tools for SQL-based analytical processing to provide a scalable solution for multilevel streaming text analytics. The outcome of this study was presented during the 2019 IEEE COMPSAC conference at Milwaukee, Wisconsin, USA. Paper: A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning. URL: https://ieeexplore.ieee.org/abstract/document/8754149
## Other activities/outcomes related to this project
### Workshops
The following are workshops that we hosted as a result of this project:
Expand Down

0 comments on commit 6cac07c

Please sign in to comment.