This project provides a comprehensive framework for retrieving, processing, and analyzing financial data, with a focus on SEC reports and stock market information. The system leverages FAISS for efficient similarity search, integrates Yahoo Finance for real-time market data, and automates SEC report retrieval and embedding for financial analysis.
- Streamlit Dashboard: Provides an interactive web application for financial data visualization.
- Automated SEC Reports Retrieval: Scrapes and processes SEC reports for financial insights.
- Vector Search with FAISS: Uses FAISS indexing for efficient similarity searches on financial data.
- Yahoo Finance Integration: Fetches real-time market data for analysis.
- Orchestrated Agent Workflow: Manages multiple processing steps for efficient data handling.
├── app
│ ├── app.py # Streamlit application script
│
├── data
│ ├── chunks.json # Preprocessed SEC report chunks
│ ├── faiss_index.bin # FAISS index for vector search
│ ├── quick-ticker-symbol-list.pdf # List of stock tickers
│ ├── quote_keys.csv # Keys for financial data extraction
│ ├── sec_reports # Folder containing downloaded SEC reports
│ ├── ticker_info.csv # Processed stock ticker information
│
├── scripts
│ ├── __init__.py # Script package initialization
│ ├── orcherstrator_agent.py # Main orchestrator for pipeline execution
│ ├── pdf_tickers_to_csv.py # Converts PDFs to structured CSVs
│ ├── sec_reports_embedding.py # Embeds SEC reports into vector space
│ ├── sec_reports_retrieval.py # Retrieves SEC reports
│ ├── sec_reports_retreival_agent.py # Agent for managing SEC retrieval
│ ├── sec_reports_scraper.py # Scrapes SEC reports from online sources
│ ├── yfinance_search_agent.py # Fetches stock market data
│ ├── yfinance_tools.py # Utility functions for Yahoo Finance API
│
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── setup.py # Package setup file
├── system_design.png # System architecture diagram
- Clone the repository:
git clone <repo-url> cd <repo-name>
- Create a virtual environment and activate it:
python3 -m venv env source env/bin/activate
- Install dependencies:
pip install -r requirements.txt
Launch the Streamlit dashboard for interactive financial data analysis:
streamlit run app/app.py
To retrieve SEC reports manually:
python scripts/sec_reports_retrieval.py
To fetch stock market data using Yahoo Finance:
python scripts/yfinance_search_agent.py
Refer to system_design.png
for an overview of the architecture and workflow of the system.
Contributions are welcome! Feel free to fork this repository, submit issues, or create pull requests.
This project is licensed under the MIT License.