GPT Thinking Extractor

Tools to scrape and extract the "Thinking" process data from ChatGPT threads and projects.

Prerequisites

Python 3.13+
Chrome/Chromium browser (for remote debugging)

Installation

Clone the repository:

git clone <repository_url>
cd gpt-thinking-extractor

Create and activate a virtual environment (Optional but recommended):

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install the package:
```
pip install .
```
Install Playwright browsers:
```
playwright install chromium
```

Usage

1. Launch Chrome in Debug Mode

The scraper connects to an existing Chrome instance via the Chrome DevTools Protocol (CDP). You must launch Chrome with remote debugging enabled and ensure you are logged in to ChatGPT.

Windows:

& "C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222

Linux / macOS:

google-chrome --remote-debugging-port=9222

Windows Subsystem for Linux (WSL): If you are running the scraper inside WSL, launch Chrome on the Windows host (using the command above). The scraper in WSL needs to connect to the Windows host IP.

Identify your Windows host IP: cat /etc/resolv.conf | grep nameserver
In the GUI configuration, update the CDP URL to http://<WINDOWS_IP>:9222

2. Run the Scraper

After installation, two commands are available in your shell:

GUI Version (Recommended):

gpt-scrape-gui

Provides a graphical interface to monitor progress and stop the scraper easily.

CLI Version:

gpt-scrape

Runs the scraping process in the terminal.

Configuration & Persistence

Selectors: The scraper uses a selectors.json file (located in the package) to find elements on the page. You can modify this if ChatGPT's UI changes.
Persistence: Scraped URLs are tracked in a local SQLite database (scraped_urls.db) to prevent re-scraping the same threads after a restart.
Output: By default, data is saved to the data/ folder in your current working directory.

Development

If you want to modify the code or run it without installing:

Install dev dependencies:
```
pip install -e .[dev]
```
Run tests:
```
pytest tests/
```

Run scripts directly:

# Make sure to set PYTHONPATH to src
export PYTHONPATH=$PYTHONPATH:$(pwd)/src
python src/gpt_thinking_extractor/scraper_gui.py

WSL Specific Notes

GUI Support: To use gpt-scrape-gui inside WSL, ensure you have an X-Server (like GWSL or VcXsrv) installed on Windows, or use WSLg (Windows 11).
Networking: By default, localhost:9222 inside WSL refers to the WSL instance itself. Use the Windows host IP if Chrome is running on Windows.
Permissions: Ensure the output directory has write permissions.

Project Structure

src/gpt_thinking_extractor/: Source code package.
- scraper_engine.py: Core logic for scraping, persistence, and file I/O.
- scraper_gui.py: Tkinter-based graphical interface.
- scrape_thoughts_final.py: Standalone CLI entry point.
- selectors.json: Externalized CSS selectors configuration.
tests/: Unit tests.
data/: Default output directory for extracted thoughts.
scraped_urls.db: SQLite database tracking processed URLs.

NO_COLOR=1 FORCE_COLOR=0 TERM=dumb gpt-scrape-gui -- --run --reporter=verbose 2>&1 | tee gpt-scrape-gui.txt

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.taskmaster		.taskmaster
codefetch		codefetch
docs		docs
public		public
specs		specs
src		src
tests		tests
.browser-echo-mcp.json		.browser-echo-mcp.json
.codefetchignore		.codefetchignore
.cursorignore		.cursorignore
.env.example		.env.example
.geminiignore		.geminiignore
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
codefetch.config.mjs		codefetch.config.mjs
db-issue-ticket.md		db-issue-ticket.md
gpt-scrape-gui.txt		gpt-scrape-gui.txt
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pyproject.toml		pyproject.toml
scraped_urls.db		scraped_urls.db
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPT Thinking Extractor

Prerequisites

Installation

Usage

1. Launch Chrome in Debug Mode

2. Run the Scraper

Configuration & Persistence

Development

WSL Specific Notes

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

AcidicSoil/gpt-thinking-extractor

Folders and files

Latest commit

History

Repository files navigation

GPT Thinking Extractor

Prerequisites

Installation

Usage

1. Launch Chrome in Debug Mode

2. Run the Scraper

Configuration & Persistence

Development

WSL Specific Notes

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages