Data Engineering Project with Hadoop HDFS and Kafka
-
Updated
Nov 4, 2023 - Python
Data Engineering Project with Hadoop HDFS and Kafka
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
Python wrapper to access Hadoop HDFS REST API
Data pipeline to process and analyse Twitter data in a distributed fashion using Apache Spark and Airflow in AWS environment
Ingestion pipeline to analyze soccer tweets
Category: Cloud Computing and Machine Learning Application - Subject: A cloud platform to make data processing with machine learning algorithms, built on Openstack, using Spark for data distribution and Hadoop Filesystem for data storage
Setup hadoop cluster manually and automatically
This is a TF-IDF calculator for shakespearean play dataset
Collection of assignments offered under COL733 - Cloud Computing by Prof. Suresh Chand Gupta
Big Data project. Web client for HDFS. Working in the terminal. Has ability to manipulate local and Hadoop storage
Worked on Hadoop file streaming
Bulk I/O Dispatch, i.e. BID Schemes. We have designed and developed two contention avoidance storage solutions, collectively known as BID: Bulk I/O Dispatch, for big data environment. BID-HDD is a disk scheduling scheme. BID-Hybrid is another contention avoidance scheme using hybrid tiers of storage for improving HDD performance using SSDs. In t…
Distributed and Parallel Database Tasks
When dealing with huge datasets, it is quite impossible that the code successfully executes on your personal desktop. You either need a locally installed clustered environment i.e. Hadoop Map-Reduce or a Cloud such as AWS. Here's an example of running such Job on AWS cloud.
Hadoop-Cluster
Add a description, image, and links to the hadoop-filesystem topic page so that developers can more easily learn about it.
To associate your repository with the hadoop-filesystem topic, visit your repo's landing page and select "manage topics."