Private Equity Knowledge Graph Prototype

This project is a prototype data pipeline and interface that ingests unstructured public data from Private Equity websites and transforms it into a structured Knowledge Graph using Neo4j.

Architecture

ETL Layer (Python + Playwright + LLM):
- Scrapes Portfolio and News pages of 20 target PE firms.
- Uses GPT-4o-mini to extract structured entities (Funds, Portcos, Events, People) and relationships from noisy HTML.
- Normalizes data into a structured JSON format.
Storage Layer (Neo4j):
- Models the industry complexity using a graph schema.
- Nodes: PEFirm, Company, Person.
- Relationships: ACQUIRED, EXITED, HIRED_BY, RAISED.
Interface Layer (Streamlit):
- Table View: A filterable ledger of all extracted events.
- Chat Interface: A "Text-to-Graph" interface using LLM to convert natural language questions into Cypher queries.

Setup Instructions

Prerequisites

Python 3.11+
Neo4j Database
OpenAI API Key (set as OPENAI_API_KEY environment variable)

Installation

Clone the repository.

Install dependencies:

pip install -r requirements.txt
playwright install chromium

Ensure Neo4j is running and update the credentials in ui/app.py and scrapers/load_to_neo4j.py.

Running the Pipeline

Scrape Data:
```
python scrapers/etl.py
```
Load to Neo4j:
```
python scrapers/load_to_neo4j.py
```
Start UI:
```
streamlit run ui/app.py
```

Graph Schema Design Choices

Nodes: We use PEFirm as the central entity. Company represents portfolio companies. Person represents key personnel.
Relationships: Instead of just a flat table, the graph allows us to see connections. For example, a Person can be linked to multiple Company nodes over time, or a PEFirm can have multiple types of relationships with a Company (Acquisition followed by Exit).
Scalability: The schema is designed to be extensible. New entity types (e.g., LPs, Sectors) can be added as nodes without breaking existing relationships.

Deliverables

Codebase: Full source code for scrapers, loader, and UI.
Data Dump: data/events.json contains the extracted data for the processed firms.
Video Walkthrough: (Note: As an AI, I cannot provide a video, but the README and code comments serve as a detailed explanation).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
scrapers		scrapers
ui		ui
README.md		README.md
firms.json		firms.json
requirements.txt		requirements.txt
streamlit.log		streamlit.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Private Equity Knowledge Graph Prototype

Architecture

Setup Instructions

Prerequisites

Installation

Running the Pipeline

Graph Schema Design Choices

Deliverables

About

Uh oh!

Releases

Packages

Languages

hima-d-bot/pe-knowledge-graph

Folders and files

Latest commit

History

Repository files navigation

Private Equity Knowledge Graph Prototype

Architecture

Setup Instructions

Prerequisites

Installation

Running the Pipeline

Graph Schema Design Choices

Deliverables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages