Skip to content

EndlessReform/instruct-hn

 
 

Repository files navigation

Hacker News 🤝 ChatGPT Plugin

This is a ChatGPT plugin to query, analyze, and summarize insights from the Hacker News community!

Demo

If you have access to ChatGPT plugins, just add this as an unverified plugin using the URL: https://hn.kix.in/

If you don't have plugins access, you can try out a basic approximation of the experience here. The full REST API exposed to ChatGPT is documented here, where you can also interact with it.

👉 Video and detailed explanation of the code 👈

Running locally

⬇️ Download the SQLite DB from 🤗

Assuming you have 32GB of RAM, take a look at playground.ipynb for quick and dirty ways to run analysis on the sqlite dataset loaded into memory.

You'll need atleast 30G of free disk space and >20G RAM to run the semantic search engine and ChatGPT plugin API. An NVIDIA GPU is highly recommended, embedding generation on CPU is painfully slow and untested on non-NVIDIA GPUs.

Clone the repo and install pre-requisites.

git clone https://github.com/anantn/hn-chatgpt-plugin.git
cd hn-chatgpt-plugin/api-server
pip install -r requirements.txt
cd ../embeddings
pip install -r requirements.txt

# Install zstd with your favorite package manager (brew, apt, etc)
sudo apt install zstd

Grab the datasets from HuggingFace and decompress them:

wget https://huggingface.co/datasets/anantn/hacker-news/resolve/main/hn-sqlite-20230429.db.zst
pzstd -d hn-sqlite-20230429.db.zst

wget https://huggingface.co/datasets/anantn/hacker-news/resolve/main/hn-sqlite-20230429_embeddings.db.zst
pzstd -d hn-sqlite-20230429_embeddings.db.zst

Run the embedding server first. The embedding server will by default try to "catch up" on all the latest data changes since the snapshot was generated. You can disable all data updates (recommended for your first run):

DB_PATH=hn-sqlite-20230429.db OPTS=nosync,noembedcu,noembedrt python main.py

If you want to generate embeddings and keep your local SQLite database up to date, just run main.py with no OPTS environment variable.

Once the embedding server is running, start the API server:

cd hn-chatgpt-plugin/api-server
DB_PATH=hn-sqlite-20230429.db python main.py

Fire up localhost:8000 in your browser!

Algolia Search Plugin

Earlier attempt, but still useful: integrates Algolia's Hacker News search API with ChatGPT plugins to have conversations about content on hacker news.

If you have plugin access, you can try it:

$ cd algolia
$ pip install -r requirements.txt
$ python app.py

Open a chat with plugins enabled, then: Plugin store > Develop your own plugin > localhost:3333 > Fetch manifest.

ChatGPT seems to hallucinate some parameters to the API, particularly the sortBy and sortOrder arguments — which may make sense to implement.

Datasette Plugin

Datasette exposes a REST API to any SQLite database. I experimented with using it to interace with ChatGPT. You can try it in the same way as the algolia plugin.

About

Hacker News as a laboratory for alignment

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Jupyter Notebook 64.9%
  • Python 18.1%
  • Rust 6.8%
  • HTML 5.1%
  • JavaScript 3.3%
  • Go 0.8%
  • Other 1.0%