-
Updated
Dec 12, 2021 - Python
dataingestion
Here are 7 public repositories matching this topic...
Export sales data from Google Sheet to a relational DBSM
-
Updated
Jun 17, 2024 - Python
The main purpose of this repository is to build the pipeline for training of regression models and predict the compressive strength of concrete to reduce the risk and cost involved in discarding the concrete structures when the concrete cube test fails.
-
Updated
Feb 27, 2023 - Python
Resource for ETL & Data Ingestion program using Apache Airflow
-
Updated
Mar 7, 2024 - Python
An application of my Centipede framework to watch 4Chan for any potentially threatening behavior.
-
Updated
Dec 8, 2019 - Python
This project implements a real-time data pipeline using Apache Airflow, Kafka, Apache Spark, and Delta Lake. It supports both batch (Coldpath) and real-time (Hotpath) data ingestion, processing, and storage. Airflow is used for orchestrating the data workflows.
-
Updated
Apr 23, 2025 - Python
Our knowledge system systematically ingests, processes, and indexes open-access life science publications. It supports internal research by providing precise question-answering and efficient retrieval from a continuously updated repository of scientific literature
-
Updated
Jun 17, 2025 - Python
Improve this page
Add a description, image, and links to the dataingestion topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dataingestion topic, visit your repo's landing page and select "manage topics."