Skip to content

KING-258/BDA_Mini

Repository files navigation

Created by KING-258

Statistics

📈 Contribution Graph

Github Activity Graph

Overview

This Python-based script fetches data from multiple APIs (NewsAPI, GDELT, Wikipedia) based on a given phrase, processes it, and sends it to a Kafka message queue. The goal is to analyze news articles, global trends, and Wikipedia summaries efficiently, within a set time limit, to avoid long processing times.

Features

  • Multi-API Integration: Fetch data from NewsAPI, GDELT, and Wikipedia based on user input.
  • Time-limited Fetching: Limits API calls to 3 minutes or 5 pages of results, ensuring efficient processing.
  • Kafka Integration: Sends combined data to a Kafka queue for real-time streaming of data after webscraping.
  • Customizable Search: Can be modified to adjust the time limits or page size.
  • Hadoop Usage: Hadoop is used for storing webscraped data and cross-referencing new searches with already searched data.

Requirements

  • Python 3.8 or above
  • Kafka
  • Hadoop
  • NewsAPI Client
  • GDELT API
  • Wikipedia API
  • Requests

Installation

  1. Clone the Repository

    git clone https://github.com/KING-258/BDA_Mini
    cd BDA_Mini

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages