big-data-processing

Here are 23 public repositories matching this topic...

souvik-databricks / dlt-with-debug

A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.

big-data spark etl python3 databricks dlt etl-pipeline big-data-processing delta-live-tables

Updated Dec 7, 2022
Python

pyajs / veronica

Star

big data processing and machine learning platform，just like useing sql

sql python3 pyspark machine-learning-platform big-data-processing xql

Updated Oct 15, 2024
Python

kochlisGit / Big-Data-Algorithms

Star

Implementation of algorithms for big data using python, numpy, pandas.

python bloom-filter lsh streams frequent-itemset-mining pcy frequent-itemsets stream-mining shingling big-data-processing lsh-algorithm min-hasing similar-items a-priori multistage-pcy multihash-pcy

Updated Apr 27, 2020
Python

chandnii7 / Big-Data-Processing-Pipeline

Star

A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.

kafka big-data mongodb twitter-api data-visualization zookeeper data-analytics kafka-consumer kafka-producer tableau nosql-database kafka-streaming big-data-processing data-processing-pipelines

Updated Aug 2, 2021
Python

IncredibleProgress / sweetheart.py

Star

rock-solid pillars for enterprise-grade solutions

python vue jupyter ubuntu rethinkdb rhel rust-lang nginx-unit tailwindcss big-data-processing py-script

Updated Feb 5, 2024
Python

JamesHanZhang / table-data-format-transform-app

Star

excel, markdown, csv, sql 数据源批量/单独格式互相转换

easy-to-use data-preprocessing etl-framework big-data-processing csv-to-excel csv-to-sql multifileupload data-cleaning-pipeline excel-to-md

Updated Nov 23, 2023
Python

levindoneto / pandas-simple-csv-parser

Star

Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.

parser csv data-manipulation pandas-dataframes conda-environment pandas-datareader big-data-processing

Updated Jan 7, 2019
Python

louiecai / Sentiment-Analysis-API

Star

Sentiment-Analysis-API

nlp machine-learning deep-learning sentiment-analysis neural-network lstm-neural-networks rnn-pytorch big-data-processing

Updated Jul 18, 2022
Python

Faisal-AlDhuwayhi / Data-Lake

Star

Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark

aws sql big-data spark amazon-emr pyspark data-engineering data-lake cloud-computing amazon-s3 etl-pipeline big-data-processing

Updated Dec 23, 2022
Python

EmilBuch / Condition-Monitoring-with-Machine-Learning

Star

This work is from my master thesis: Condition Monitoring with Machine Learning: A Data-Driven Framework for Quantifying Wind Turbine Energy Loss.

machine-learning big-data wind-turbine hawt big-data-analytics condition-monitoring big-data-processing energy-loss

Updated Jun 21, 2025
Python

JKA098 / Pokemon-Feistiness-Apache-Spark-Job

Star

The following readme file, assume that before running the Spark analytic job, you have already installed the correct versions of **Java**, **Hadoop**, **Spark** and that you are inside **Ubuntu**.

java ubuntu distributed-computing open-data batch-processing data-pipeline hadoop-mapreduce cluster-computing linux-environment big-data-processing apache-sparksql data-analytics-project

Updated May 8, 2025
Python

OuchenOussama / hespressence

Star

Kappa Architecture Based Sentiment Analysis System for User Comments

nlp big-data sentiment-analysis big-data-analytics big-data-processing

Updated Feb 10, 2025
Python

Turnipdo / Docker-Spark-Setup

Star

Setting up a Spark cluster in a Docker environment for improved repeatability and reliability. This project includes a simple transformation on a dataset containing approximately 31 million rows.

setup spark docker-container big-data-processing

Updated Jun 21, 2024
Python

vishu-tyagi / BigQuery-ELT

Star

BigQuery data pipeline with dbt, Spark, Docker, Airflow, Terraform, GCP

python docker bigquery airflow spark terraform pyspark dbt elt batch-processing big-data-analytics etl-pipeline big-data-processing elt-pipeline

Updated Feb 6, 2023
Python

mixaisealx / DevOps-n-DataOps

Star

Hands-on project demos covering infrastructure automation (Ansible, Docker), big-data processing & streaming (Hive, Spark, Kafka), and network experiments (MitM, TCP-over-UDP).

Updated Oct 16, 2025
Python

abhinav-bohra / Big-Data-Processing

Star

Exploring and Implementing Scalable Data Processing Techniques

big-data spark multiprocessing multithreading pyspark word-association big-data-analytics big-data-processing krager-mincut

Updated May 11, 2023
Python

DiaconuAna / Big-Data-Pipeline

Star

Electrical Consumption Monitoring - Big Data Pipeline using Lambda Architecture in Python

big-data cassandra docker-compose kafka-producer grafana-dashboard lambda-architecture spark-cassandra-connector trino big-data-processing big-data-pipeline

Updated Jan 26, 2025
Python

ridakn / Big-Data-Top-K-Words

Star

Project using Python, Hive and MapReduce to compare various techniques to find the top K words in a very large file i.e. different techniques to process Big Data.

big-data hive mapreduce mapreduce-python top-k-query big-data-processing

Updated Jun 23, 2021
Python

superminority / jsv

Star

A compact way to represent a stream of similar json objects

python json csv big-data python3 big-data-processing

Updated Dec 26, 2022
Python

mikhail-kukuyev / Masters-Degree-Courses

Star

Solved tasks of the master's degree courses of speciality "Algorithms and Systems for Big Data Processing".

machine-learning information-retrieval highload mpi neural-networks external-memory university-course python-course randomized-algorithms cache-optimization page-rank big-data-processing

Updated Jan 13, 2019
Python

Improve this page

Add a description, image, and links to the big-data-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data-processing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

big-data-processing

Here are 23 public repositories matching this topic...

souvik-databricks / dlt-with-debug

pyajs / veronica

kochlisGit / Big-Data-Algorithms

chandnii7 / Big-Data-Processing-Pipeline

IncredibleProgress / sweetheart.py

JamesHanZhang / table-data-format-transform-app

levindoneto / pandas-simple-csv-parser

louiecai / Sentiment-Analysis-API

Faisal-AlDhuwayhi / Data-Lake

EmilBuch / Condition-Monitoring-with-Machine-Learning

JKA098 / Pokemon-Feistiness-Apache-Spark-Job

OuchenOussama / hespressence

Turnipdo / Docker-Spark-Setup

vishu-tyagi / BigQuery-ELT

mixaisealx / DevOps-n-DataOps

abhinav-bohra / Big-Data-Processing

DiaconuAna / Big-Data-Pipeline

ridakn / Big-Data-Top-K-Words

superminority / jsv

mikhail-kukuyev / Masters-Degree-Courses

Improve this page

Add this topic to your repo