Skip to content
#

dataingestion

Here are 7 public repositories matching this topic...

The main purpose of this repository is to build the pipeline for training of regression models and predict the compressive strength of concrete to reduce the risk and cost involved in discarding the concrete structures when the concrete cube test fails.

  • Updated Feb 27, 2023
  • Python

This project implements a real-time data pipeline using Apache Airflow, Kafka, Apache Spark, and Delta Lake. It supports both batch (Coldpath) and real-time (Hotpath) data ingestion, processing, and storage. Airflow is used for orchestrating the data workflows.

  • Updated Apr 23, 2025
  • Python

Our knowledge system systematically ingests, processes, and indexes open-access life science publications. It supports internal research by providing precise question-answering and efficient retrieval from a continuously updated repository of scientific literature

  • Updated Jun 17, 2025
  • Python

Improve this page

Add a description, image, and links to the dataingestion topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataingestion topic, visit your repo's landing page and select "manage topics."

Learn more