A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
-
Updated
Dec 7, 2022 - Python
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
big data processing and machine learning platform,just like useing sql
Implementation of algorithms for big data using python, numpy, pandas.
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
rock-solid pillars for enterprise-grade solutions
excel, markdown, csv, sql 数据源批量/单独格式互相转换
Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.
Hands-on project demos covering infrastructure automation (Ansible, Docker), big-data processing & streaming (Hive, Spark, Kafka), and network experiments (MitM, TCP-over-UDP).
Sentiment-Analysis-API
Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark
This work is from my master thesis: Condition Monitoring with Machine Learning: A Data-Driven Framework for Quantifying Wind Turbine Energy Loss.
The following readme file, assume that before running the Spark analytic job, you have already installed the correct versions of **Java**, **Hadoop**, **Spark** and that you are inside **Ubuntu**.
BigQuery data pipeline with dbt, Spark, Docker, Airflow, Terraform, GCP
Setting up a Spark cluster in a Docker environment for improved repeatability and reliability. This project includes a simple transformation on a dataset containing approximately 31 million rows.
Kappa Architecture Based Sentiment Analysis System for User Comments
Exploring and Implementing Scalable Data Processing Techniques
Solved tasks of the master's degree courses of speciality "Algorithms and Systems for Big Data Processing".
"Provides tools for parallel pipeline processing of large data structures
Software basati su metodi di intelligenza artificiale per l'automazione dell'analisi di big data.
Electrical Consumption Monitoring - Big Data Pipeline using Lambda Architecture in Python
Add a description, image, and links to the big-data-processing topic page so that developers can more easily learn about it.
To associate your repository with the big-data-processing topic, visit your repo's landing page and select "manage topics."