dataproc
Here are 48 public repositories matching this topic...
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
-
Updated
May 3, 2024 - Python
Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and pipelines.
-
Updated
Mar 20, 2023 - Python
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
-
Updated
Sep 19, 2022 - Python
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
-
Updated
Mar 9, 2022 - Python
An educational project to build an end-to-end pipline for near real-time and batch processing of data further used for visualisation and a machine learning model.
-
Updated
May 19, 2023 - Python
✈ A Spark-based ETL Pipeline for the OpenSky and OpenFlights Datasets
-
Updated
Mar 16, 2021 - Python
Data Workflows with GCP Dataproc, Apache Airflow and Apache Spark
-
Updated
Mar 4, 2020 - Python
Collected data about from three sources, one opinion-based social media in twitter, research data in New York Times, and the third is the common crawl data for the same topic or key phrase, and from similar time periods. Processed the three data sets collected individually using classical big data methods like Map Reduce in Google Dataproc Clust…
-
Updated
Oct 25, 2019 - Python
Repositório para armazenar artefatos de um trabalho da disciplina de Computação Distribuída.
-
Updated
Jun 26, 2023 - Python
💥🚗 USA Accidents Data Engineering Project
-
Updated
Aug 5, 2024 - Python
-
Updated
Nov 18, 2020 - Python
Digital Innovation One - Desafio GCP Dataproc. O desafio consiste em efetuar um processamento de dados utilizando o produto Dataproc do GCP. Esse processamento irá efetuar a contahem das palavras de um livro e informar quantas vezes cada palavra aparece no mesmo.
-
Updated
Jul 13, 2021 - Python
Orchestration Dataproc serverless job with Airflow
-
Updated
Oct 25, 2023 - Python
This project demonstrates how to build a real-time product recommendation system using Pub/Sub Lite and Apache Spark with Dataproc
-
Updated
Jan 8, 2025 - Python
-
Updated
Sep 18, 2022 - Python
Improve this page
Add a description, image, and links to the dataproc topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dataproc topic, visit your repo's landing page and select "manage topics."