Automated, scalable, and modular ETL pipeline using yahooquery
to extract pricing, financial statements, and fundamental data — all stored in a PostgreSQL database for easy querying and analysis.
- ✅ Clean, plug-and-play ETL pipeline (3 segments: Pricing, Financials, Fundamentals)
- 📥 Automatically scrapes S&P 500 tickers (or lets you configure your own universe)
- 🧱 Creates and manages PostgreSQL database schema + tables
- 🗃️ Organized output directories, archiving logic and file handling
- ⚙️ Fully modular: update or extend segments easily
- 🔒 Secure
.env
config (example provided)
- Python 3.9+
- PostgreSQL (download here)
- Libraries: see
requirements.txt
git clone https://github.com/NPStraight2ThePoint/yahooquery-etl-postgresql-prod.git
cd yahooquery-etl-postgresql-prod
Create your .env file using the provided template:
cp .env.example .env
Edit .env with your local PostgreSQL credentials:
DB_HOST=localhost
DB_PORT=5432
DB_USER=your_username
DB_PASSWORD=your_password
DB_NAME=yahooquery_db
We recommend using a virtual environment:
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
Run:
python _1_run_setup.py
You have three flexible options for defining your ticker universe:
- Run the script to auto-fetch the S&P 500: python _2_get_sp500_tickers.py
- Manually replace the default ticker list in output/Static Data/Tickers.csv with your own list of tickers.
- Edit the scraping logic inside _2_get_sp500_tickers.py to adapt it to other universes — such as ASX 200, ETFs, or your own custom watchlist.
You can either:
- Use the Global Orchestrator (recommended):
python _3_global_orchestrator.py
Or:
- Run each module manually (pricing, financials, fundamentals):
python etl/_1_pricing/pricing_orchestrator.py
python etl/_2_financial_statements/financials_statements_orchestrator.py
python etl/_3_fundamentals/fundamentals_orchestrator.py
📦 Archive Old Data (Optional) After a run, clean up and archive raw data:
python _4_archive_dir.py
📊 What's Included
📁 Historical pricing
📁 Option chains
📁 Technical insights
📁 Financial statements (IS, BS, CF) / (Annual/Quarterly)
📁 Company fundamentals
📁 Static profiles, summaries, and more
yahooquery-etl-postgresql-prod/
├── archive/ # Archived CSVs for version tracking
│ └── data/
├── archive_dir.py # Archive logic
├── etl/ # ETL scripts for each data segment
│ ├── _1_pricing/
│ ├── _2_financial_statements/
│ └── _3_fundamentals/
├── get_sp500_tickers.py # Auto-download S&P 500 tickers
├── global_orchestrator.py # Runs all segments in order
├── output/ # Fetched raw data
│ ├── _1_pricing/
│ ├── _2_financials/
│ ├── _3_fundamentals/
│ └── merged # Merged outputs
├── requirements.txt
├── run_setup.py # Runs DB creation, schema/tables & folder setup
├── setup/
│ ├── create_db.py
│ ├── init_schema_tables.py
│ └── create_dirs.py
├── sql_db_schema/ # CSV schema definition files
│ └── sql_schema.csv
├── utils.py # Helper functions + shared paths
├── .env.example # Template for local credentials
├── .gitignore # Excludes sensitive files
└── README.md
✅ Tested on 200 Tickers.
📌 Upcoming Enhancements:
- Automated testing
- GitHub Actions for CI/CD
- Additional Yahoo data modules
Author: Nicholas Papadimitris
Created On: 09 July 2025, 06:00 AM UTC
Project ID: YF_YQ_ETL_09_Jul2025
- 🐙 GitHub: @NPStraight2ThePoint
- 💼 LinkedIn: Nicholas Papadimitris
- 📧 Email: nicholas.papadimitris@gmail.com
This project is licensed under the MIT License — free to use, modify, and distribute.
If you distribute or share this repository or its contents publicly, you must:
- ✅ Provide appropriate credit to the original author.
- ✅ Include a link to the original repository:
https://github.com/NPStraight2ThePoint/yahooquery-etl-postgresql-prod - ✅ Clearly indicate if any changes were made.
You may do so in any reasonable manner, but not in any way that suggests the original author or this repository endorses you or your use.
This project uses and builds upon the following external sources, which should be credited as per their own licenses:
yahooquery
: Python library for Yahoo Finance API, used here for data extraction.- Data sourced from Wikipedia for S&P 500 constituents and related metadata.
Please refer to their respective licenses and terms when redistributing or modifying those components.