An interactive visualization dashboard that maps out the hidden web of corporate relationships.
Stock Relation parses financial data and SEC filings to generate a sprawling, interactive knowledge graph. It allows users to visually explore dynamic cross-market connections through three distinct lenses: vector similarity, equity ownership, and supply chain connections.
Built with FastAPI + Qdrant + SQLite + D3.js, starting with the Magnificent 7 (AAPL, MSFT, GOOGL, AMZN, NVDA, META, TSLA) or the broader S&P 500.
stock-relation-demo.mp4
Stock Relation operates through a modular pipeline that ingests, processes, stores, and visualizes complex financial data.
flowchart TD
subgraph DataSources [Data Sources]
YF["Yahoo Finance<br>(Profiles)"]
SEC["SEC EDGAR<br>(13F, 10-K, 20-F)"]
GN["Google News<br>(Search API)"]
end
subgraph Ingestion ["Ingestion Layer (Python)"]
SIM["Similarity<br>SentenceTransformer<br>(all-MiniLM-L6-v2)"]
OWN["Ownership<br>SC13/13F Parser<br>Gemini Flash (LLM)<br>Gemma fallback"]
SC["Supply Chain<br>EFTS Search API<br>DistilBERT<br>(Zero-Shot Classifier)"]
end
YF --> SIM
SEC --> OWN
GN --> SC
subgraph Storage ["Storage Layer"]
QDB[("Qdrant DB<br>(Vector Semantic<br>Search Space)")]
SQL[("SQLite DB<br>(Rigid Relational<br>Edges & Nodes)")]
end
SIM --> QDB
OWN --> SQL
SC --> SQL
subgraph Backend ["FastAPI Backend (port: 8000)"]
direction LR
API1["/api/companies/search"]
API2["/api/relations/graph/{ticker}"]
end
QDB --> Backend
SQL --> Backend
%% Invisible links to force horizontal rank layout inside Backend subgraph
API1 ~~~ API2
subgraph UI ["Interactive Dashboard"]
D3["D3.js Force Graph"]
R["Real-time Physics"]
LEG["Dynamic Hover Legends"]
SM["Sidebar Metrics UI"]
end
Backend --> UI
%% Invisible links to force horizontal rank layout inside UI subgraph
D3 ~~~ R
R ~~~ LEG
LEG ~~~ SM
The ingestion/ module is responsible for gathering raw data from multiple sources and enriching it using AI:
- Financial APIs: Scrapes
yfinanceto build baseline company profiles, sector classifications, and business summaries. - SEC EDGAR: Queries the SEC's EFTS and Archives APIs for
SC 13D/13G,13F-HR,10-K, and20-Fform filings. - LLM Web Discovery: Uses
Gemini 2.5 Flash(with Google Search grounding and agemma-3-27b-itfallback) to analyze the web for unlisted private equity investments and venture deals. - NLP Sentiment Analysis: Passes extracted SEC 10-K paragraphs into a local HuggingFace zero-shot classifier (
typeform/distilbert-base-uncased-mnli) to verify the direction and sentiment of B2B supply chain links.
To optimize both hybrid search and relational graphing, the system uses two distinct databases:
- Qdrant (Vector DB): Stores company metadata alongside 384-dimensional dense vectors generated by a sentence-transformer (
all-MiniLM-L6-v2). This allows the backend to perform blazing-fast cosine similarity searches to identify peer companies with highly aligned business models. - SQLite (Relational DB): Stores the explicit, rigid graph edges (Ownership and Supply Chain connections). SQLite is used here because these relationships are exact entity-to-entity mappings rather than fuzzy semantic concepts.
A lightweight, async Python server (api/) that acts as the bridge between the data stores and the frontend. It exposes endpoints to:
- Serve search results via Qdrant's payload indexes.
- Dynamically construct and fuse sub-graphs (Similarity + Ownership + Supply Chain) on the fly for specific tickers.
- Stream the entire global knowledge graph (
/api/relations/graph-all) directly from SQLite for macroscopic visualization.
The graphical user interface (frontend/) is a vanilla HTML/JS/CSS application centered around a heavily customized D3.js instance.
Key interactive features include:
- Force-Directed Physics: Nodes automatically repel each other while edges act as springs, creating an organically self-organizing organic layout.
- Progressive Node Expansion: Clicking any node dynamically requests that specific company's sub-graph from FastAPI and seamlessly fuses it into the existing simulation, allowing infinite exploration.
- Dynamic Legends: The side-panel legends (Relation, Node, Sector) are auto-generated based on the currently visible nodes. Hovering over a legend item instantly highlights matching nodes/edges on the canvas while plunging unrelated elements into shadow.
- Search & Focus: A built-in search bar allows users to instantly teleport to and highlight a specific ticker within sprawling networks.
| Type | Source | Meaning |
|---|---|---|
| π΅ Similarity | yfinance profiles + embeddings | Companies with similar business profile |
| π‘ Ownership | SEC SC 13D/13G / 13F + LLM | Company A has invested in / acquired Company B. Uses Gemini 2.5 Flash to discover unlisted private deals. |
| π’ Supply Chain | SEC 10-K filings + Transformers | Supplier / customer relationships identified by local zero-shot sentiment analysis. |
Methodology: The system fetches comprehensive company business summaries from Yahoo Finance (yfinance). These natural language descriptions (along with industry and sector tags) are passed locally into a HuggingFace sentence transformer (all-MiniLM-L6-v2) to generate dense 384-dimensional vector embeddings.
These vectors are uploaded to Qdrant. The FastAPI backend then performs a fast Cosine Similarity vector search to find and link companies whose core business models are highly aligned.
Methodology: Ownership edges represent direct financial investment or acquisition. The pipeline discovers these in two ways:
- SEC Filings: Scrapes the SEC EDGAR archives for
SC 13D,SC 13G, and13F-HRfilings. It parses the SGML headers and XML information tables to find instances where the target company holds stakes in other public equities. - LLM Web Discovery: To capture private deals, venture capital investments, and acquisitions not captured in 13F forms, the system uses Gemini 2.5 Flash (with Google Search grounding enabled to read live news). If rate-limited, it automatically falls back to
gemma-3-27b-it.
Methodology: Supply chain edges identify explicit B2B supplier and customer dependencies.
- EDGAR EFTS Search: The system leverages the SEC's ElasticSearch-powered EFTS API to query recent
10-Kand20-Fannual reports. It searches the raw text for mentions of the target companies' aliases. - Zero-Shot Sentiment: When a mention is found, the surrounding paragraph is extracted. We pass this context blindly into a local HuggingFace NLP pipeline (
typeform/distilbert-base-uncased-mnli). The zero-shot classifier evaluates the paragraph against candidate labels like "is a customer of", "is a supplier to", or "is a competitor to" to accurately label the direction and nature of the relationship hook.
We provide a unified ./run.sh launcher that completely automates environment setup, Docker/Colima provisioning, Qdrant initialization, data ingestion, and FastAPI spinning.
- Docker (Docker Desktop or Docker Engine must be installed and running)
- Gemini API Key (Required for the
gemini-2.5-flashorgemma-3-27b-itmodel to search for private ownership deals via Google Search grounding). Get one from Google AI Studio.
cp .env.example .env
# Open .env and ensure you provide:
# 1. EDGAR_USER_AGENT (Your Name <your@email.com>)
# 2. GEMINI_API_KEY (Your actual AI Studio Key)Just run the wizard:
./run.sh(Windows Users: Run this in WSL2 or a bash-compatible terminal like Git Bash)
The script will auto-install uv, boot qdrant via Docker, prompt you to ingest data (defaulting to the M7 preset), and launch the API server.
Useful Launcher Flags:
| Flag | Description |
|---|---|
--help |
Show the help menu with all available options. |
--ingest |
Force the ingestion pipeline to run, bypassing the yes/no prompt. |
--no-ingest |
Skip the ingestion pipeline and immediately boot FastAPI using the existing Qdrant/SQLite data. |
--preset <name> |
Select a predefined group of tickers to ingest (e.g., m7 or sp500). Defaults to m7. |
--tickers <T1> <T2> |
Manually pass a space-separated list of stock tickers to process (e.g., --tickers AAPL MSFT NVDA). |
--skip-similarity |
Skip finding similarity relations. |
--skip-ownership |
Skip finding ownership relations. |
--skip-supply-chain |
Skip finding supply chain relations. |
Example Advanced Usage:
# Ingest specific tickers, but skip the heavy LLM/NLP analysis steps
./run.sh --ingest --tickers AAPL MSFT --skip-supply-chain --skip-ownershipOpen http://localhost:8000 when the script finishes.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/health |
Health check |
| GET | /api/companies |
List all companies |
| GET | /api/companies/search?q=apple |
Search companies |
| GET | /api/companies/{ticker} |
Company detail |
| GET | /api/relations/{ticker}?type=all |
All relations |
| GET | /api/relations/graph/{ticker} |
D3 graph data |
| GET | /api/relations/graph-all |
Fetch entire global DB |
stock-relation/
βββ run.sh # One-click launcher
βββ pyproject.toml # Dependencies (uv)
βββ .env.example # Config template
βββ data/
β βββ relations.db # SQLite (Ownership, Supply Chain edges)
βββ ingestion/
β βββ fetcher.py # yfinance data + EDGAR requests
β βββ embedder.py # Sentence-transformer embeddings β Qdrant
β βββ ownership.py # SEC 13F parsing + LLM analysis
β βββ supply_chain.py # SEC 10-K supply chain edges + sentiment analysis
β βββ pipeline.py # Orchestrator
βββ api/
β βββ main.py # FastAPI app
β βββ models.py # Pydantic schemas
β βββ db.py # SQLite connection helper
β βββ qdrant_client.py # Qdrant client helper
β βββ routes/
β βββ companies.py # Company endpoints
β βββ relations.py # Relation + global graph endpoints
βββ frontend/
β βββ index.html # D3 Dashboard HTML
β βββ style.css # UI Styles
β βββ app.js # Network rendering & UX logic
βββ scripts/
βββ ingest.py # Under-the-hood ingestion CLI
Educational Use Only: This project is designed strictly for educational and conceptual research. The relationships mappings are generated based on scattered SEC filings and news sources with the help of ML models. They are not financial advice, nor are they guaranteed to be accurate, comprehensive, or up to date.
While the default runtime is tested specifically on the Magnificent 7 (AAPL, MSFT, GOOGL, AMZN, NVDA, META, TSLA), the architecture is designed to scale and should robustly support ingestion and mapping of all S&P 500 companies.
This project is licensed under the MIT License. See the LICENSE file for more details.