Mega Scale Multimodal DataPipeline for SOTA models
-
Updated
Jan 30, 2026 - Python
Mega Scale Multimodal DataPipeline for SOTA models
Python / Automation – Automates job scraping by keyword and location, filters duplicates, and emails listings daily. Python scripting, web scraping, scheduling, and data pipeline development.
This Ads Data Pipeline simulates and processes company and Google advertising data, calculating key performance metrics in real time. It stores the data in MongoDB, schedules regular updates, and saves processed statistics to CSV files for further analysis.
Repo that relates to the Medium blog 'Creating serverless data pipelines with Azure Functions and Azure Pipelines'
Building a four-step data pipeline using Airflow to download podcast episodes.
Data pipeline using Databricks asset bundle & dbt
Add a description, image, and links to the datapipelines topic page so that developers can more easily learn about it.
To associate your repository with the datapipelines topic, visit your repo's landing page and select "manage topics."