The leader in Customer Data Infrastructure
-
Updated
Jun 4, 2025 - Scala
The leader in Customer Data Infrastructure
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)
Model complex data transformation pipelines easily
NebulaGraph Exchange is an Apache Spark application to parse data from different sources to NebulaGraph in a distributed environment. It supports both batch and streaming data in various formats and sources including other Graph Databases, RDBMS, Data warehouses, NoSQL, Message Bus, File systems, etc.
OpenSnowcat Enricher (Apache 2.0 License)
Snowplow Enrichment jobs and library
Resilient data pipeline framework running on Apache Spark
A big data project to develop a real-time data pipeline for analyzing the popularity and sentiments of trending topics on Twitter.
This project describes how to write full ETL data pipeline using spark.
Data Generators -> Kafka -> Spark Streaming -> PostgreSQL -> Grafana
Real-time streaming data pipeline for Twitter Tweets
OpenSnowcat Relational Database Loader (Apache 2.0 License)
TADOD - Data Pipeline for TLC Trip Record Data using Modern Tech Stack
A cutting-edge big data initiative aimed at creating a real-time data pipeline to analyze the popularity and sentiments of trending topics on Twitter.
Basic starter template for building Spark job in Scala and sbt.
Pipeline de dados no Azure para base de imóveis, com estrutura em três camadas (unbound, silver, gold) e trigger automática a cada hora para atualização consistente.
A simple data transformation pipeline in Scala reading CSVs, joining data, and aggregating results.
GameTuner Enricher application for processing raw events
GameTuner Scala Stream Collector is project for collecting raw events from tracker
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."