Skip to content

supat-roong/stock-relation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”— Stock Relation

An interactive visualization dashboard that maps out the hidden web of corporate relationships.

Stock Relation parses financial data and SEC filings to generate a sprawling, interactive knowledge graph. It allows users to visually explore dynamic cross-market connections through three distinct lenses: vector similarity, equity ownership, and supply chain connections.

Built with FastAPI + Qdrant + SQLite + D3.js, starting with the Magnificent 7 (AAPL, MSFT, GOOGL, AMZN, NVDA, META, TSLA) or the broader S&P 500.


Demo

stock-relation-demo.mp4

Architecture

Stock Relation operates through a modular pipeline that ingests, processes, stores, and visualizes complex financial data.

flowchart TD
    subgraph DataSources [Data Sources]
        YF["Yahoo Finance<br>(Profiles)"]
        SEC["SEC EDGAR<br>(13F, 10-K, 20-F)"]
        GN["Google News<br>(Search API)"]
    end

    subgraph Ingestion ["Ingestion Layer (Python)"]
        SIM["Similarity<br>SentenceTransformer<br>(all-MiniLM-L6-v2)"]
        OWN["Ownership<br>SC13/13F Parser<br>Gemini Flash (LLM)<br>Gemma fallback"]
        SC["Supply Chain<br>EFTS Search API<br>DistilBERT<br>(Zero-Shot Classifier)"]
    end

    YF --> SIM
    SEC --> OWN
    GN --> SC

    subgraph Storage ["Storage Layer"]
        QDB[("Qdrant DB<br>(Vector Semantic<br>Search Space)")]
        SQL[("SQLite DB<br>(Rigid Relational<br>Edges & Nodes)")]
    end

    SIM --> QDB
    OWN --> SQL
    SC --> SQL

    subgraph Backend ["FastAPI Backend (port: 8000)"]
        direction LR
        API1["/api/companies/search"]
        API2["/api/relations/graph/{ticker}"]
    end

    QDB --> Backend
    SQL --> Backend

    %% Invisible links to force horizontal rank layout inside Backend subgraph
    API1 ~~~ API2

    subgraph UI ["Interactive Dashboard"]
        D3["D3.js Force Graph"]
        R["Real-time Physics"]
        LEG["Dynamic Hover Legends"]
        SM["Sidebar Metrics UI"]
    end

    Backend --> UI

    %% Invisible links to force horizontal rank layout inside UI subgraph
    D3 ~~~ R
    R ~~~ LEG
    LEG ~~~ SM
Loading

1. Data Ingestion & Enrichment

The ingestion/ module is responsible for gathering raw data from multiple sources and enriching it using AI:

  • Financial APIs: Scrapes yfinance to build baseline company profiles, sector classifications, and business summaries.
  • SEC EDGAR: Queries the SEC's EFTS and Archives APIs for SC 13D/13G, 13F-HR, 10-K, and 20-F form filings.
  • LLM Web Discovery: Uses Gemini 2.5 Flash (with Google Search grounding and a gemma-3-27b-it fallback) to analyze the web for unlisted private equity investments and venture deals.
  • NLP Sentiment Analysis: Passes extracted SEC 10-K paragraphs into a local HuggingFace zero-shot classifier (typeform/distilbert-base-uncased-mnli) to verify the direction and sentiment of B2B supply chain links.

2. Dual Storage Layer

To optimize both hybrid search and relational graphing, the system uses two distinct databases:

  • Qdrant (Vector DB): Stores company metadata alongside 384-dimensional dense vectors generated by a sentence-transformer (all-MiniLM-L6-v2). This allows the backend to perform blazing-fast cosine similarity searches to identify peer companies with highly aligned business models.
  • SQLite (Relational DB): Stores the explicit, rigid graph edges (Ownership and Supply Chain connections). SQLite is used here because these relationships are exact entity-to-entity mappings rather than fuzzy semantic concepts.

3. FastAPI Backend

A lightweight, async Python server (api/) that acts as the bridge between the data stores and the frontend. It exposes endpoints to:

  • Serve search results via Qdrant's payload indexes.
  • Dynamically construct and fuse sub-graphs (Similarity + Ownership + Supply Chain) on the fly for specific tickers.
  • Stream the entire global knowledge graph (/api/relations/graph-all) directly from SQLite for macroscopic visualization.

4. Interactive D3 Dashboard

The graphical user interface (frontend/) is a vanilla HTML/JS/CSS application centered around a heavily customized D3.js instance.

Key interactive features include:

  • Force-Directed Physics: Nodes automatically repel each other while edges act as springs, creating an organically self-organizing organic layout.
  • Progressive Node Expansion: Clicking any node dynamically requests that specific company's sub-graph from FastAPI and seamlessly fuses it into the existing simulation, allowing infinite exploration.
  • Dynamic Legends: The side-panel legends (Relation, Node, Sector) are auto-generated based on the currently visible nodes. Hovering over a legend item instantly highlights matching nodes/edges on the canvas while plunging unrelated elements into shadow.
  • Search & Focus: A built-in search bar allows users to instantly teleport to and highlight a specific ticker within sprawling networks.

Three Relation Types

Type Source Meaning
πŸ”΅ Similarity yfinance profiles + embeddings Companies with similar business profile
🟑 Ownership SEC SC 13D/13G / 13F + LLM Company A has invested in / acquired Company B. Uses Gemini 2.5 Flash to discover unlisted private deals.
🟒 Supply Chain SEC 10-K filings + Transformers Supplier / customer relationships identified by local zero-shot sentiment analysis.

1. Vector Similarity

Methodology: The system fetches comprehensive company business summaries from Yahoo Finance (yfinance). These natural language descriptions (along with industry and sector tags) are passed locally into a HuggingFace sentence transformer (all-MiniLM-L6-v2) to generate dense 384-dimensional vector embeddings.

These vectors are uploaded to Qdrant. The FastAPI backend then performs a fast Cosine Similarity vector search to find and link companies whose core business models are highly aligned.

2. Equity Ownership

Methodology: Ownership edges represent direct financial investment or acquisition. The pipeline discovers these in two ways:

  1. SEC Filings: Scrapes the SEC EDGAR archives for SC 13D, SC 13G, and 13F-HR filings. It parses the SGML headers and XML information tables to find instances where the target company holds stakes in other public equities.
  2. LLM Web Discovery: To capture private deals, venture capital investments, and acquisitions not captured in 13F forms, the system uses Gemini 2.5 Flash (with Google Search grounding enabled to read live news). If rate-limited, it automatically falls back to gemma-3-27b-it.

3. Supply Chain Connections

Methodology: Supply chain edges identify explicit B2B supplier and customer dependencies.

  1. EDGAR EFTS Search: The system leverages the SEC's ElasticSearch-powered EFTS API to query recent 10-K and 20-F annual reports. It searches the raw text for mentions of the target companies' aliases.
  2. Zero-Shot Sentiment: When a mention is found, the surrounding paragraph is extracted. We pass this context blindly into a local HuggingFace NLP pipeline (typeform/distilbert-base-uncased-mnli). The zero-shot classifier evaluates the paragraph against candidate labels like "is a customer of", "is a supplier to", or "is a competitor to" to accurately label the direction and nature of the relationship hook.

Quick Start

We provide a unified ./run.sh launcher that completely automates environment setup, Docker/Colima provisioning, Qdrant initialization, data ingestion, and FastAPI spinning.

1. Requirements

  • Docker (Docker Desktop or Docker Engine must be installed and running)
  • Gemini API Key (Required for the gemini-2.5-flash or gemma-3-27b-it model to search for private ownership deals via Google Search grounding). Get one from Google AI Studio.

2. Configure environment

cp .env.example .env

# Open .env and ensure you provide:
# 1. EDGAR_USER_AGENT (Your Name <your@email.com>)
# 2. GEMINI_API_KEY (Your actual AI Studio Key)

3. Launch the App

Just run the wizard:

./run.sh

(Windows Users: Run this in WSL2 or a bash-compatible terminal like Git Bash)

The script will auto-install uv, boot qdrant via Docker, prompt you to ingest data (defaulting to the M7 preset), and launch the API server.

Useful Launcher Flags:

Flag Description
--help Show the help menu with all available options.
--ingest Force the ingestion pipeline to run, bypassing the yes/no prompt.
--no-ingest Skip the ingestion pipeline and immediately boot FastAPI using the existing Qdrant/SQLite data.
--preset <name> Select a predefined group of tickers to ingest (e.g., m7 or sp500). Defaults to m7.
--tickers <T1> <T2> Manually pass a space-separated list of stock tickers to process (e.g., --tickers AAPL MSFT NVDA).
--skip-similarity Skip finding similarity relations.
--skip-ownership Skip finding ownership relations.
--skip-supply-chain Skip finding supply chain relations.

Example Advanced Usage:

# Ingest specific tickers, but skip the heavy LLM/NLP analysis steps
./run.sh --ingest --tickers AAPL MSFT --skip-supply-chain --skip-ownership

Open http://localhost:8000 when the script finishes.


API Endpoints

Method Endpoint Description
GET /api/health Health check
GET /api/companies List all companies
GET /api/companies/search?q=apple Search companies
GET /api/companies/{ticker} Company detail
GET /api/relations/{ticker}?type=all All relations
GET /api/relations/graph/{ticker} D3 graph data
GET /api/relations/graph-all Fetch entire global DB

Project Structure

stock-relation/
β”œβ”€β”€ run.sh                  # One-click launcher
β”œβ”€β”€ pyproject.toml          # Dependencies (uv)
β”œβ”€β”€ .env.example            # Config template
β”œβ”€β”€ data/
β”‚   └── relations.db        # SQLite (Ownership, Supply Chain edges)
β”œβ”€β”€ ingestion/
β”‚   β”œβ”€β”€ fetcher.py          # yfinance data + EDGAR requests
β”‚   β”œβ”€β”€ embedder.py         # Sentence-transformer embeddings β†’ Qdrant
β”‚   β”œβ”€β”€ ownership.py        # SEC 13F parsing + LLM analysis
β”‚   β”œβ”€β”€ supply_chain.py     # SEC 10-K supply chain edges + sentiment analysis
β”‚   └── pipeline.py         # Orchestrator
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ main.py             # FastAPI app
β”‚   β”œβ”€β”€ models.py           # Pydantic schemas
β”‚   β”œβ”€β”€ db.py               # SQLite connection helper
β”‚   β”œβ”€β”€ qdrant_client.py    # Qdrant client helper
β”‚   └── routes/
β”‚       β”œβ”€β”€ companies.py    # Company endpoints
β”‚       └── relations.py    # Relation + global graph endpoints
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html          # D3 Dashboard HTML
β”‚   β”œβ”€β”€ style.css           # UI Styles
β”‚   └── app.js              # Network rendering & UX logic
└── scripts/
    └── ingest.py           # Under-the-hood ingestion CLI

Disclaimer

Educational Use Only: This project is designed strictly for educational and conceptual research. The relationships mappings are generated based on scattered SEC filings and news sources with the help of ML models. They are not financial advice, nor are they guaranteed to be accurate, comprehensive, or up to date.

While the default runtime is tested specifically on the Magnificent 7 (AAPL, MSFT, GOOGL, AMZN, NVDA, META, TSLA), the architecture is designed to scale and should robustly support ingestion and mapping of all S&P 500 companies.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

An interactive dashboard that maps corporate relationships by parsing financial data and SEC filings into a dynamic knowledge graph of similarity, ownership, and supply chains.

Topics

Resources

License

Stars

Watchers

Forks

Contributors