This is essentially a tool for issue scraping and LLM-based analysis.
Support incrementally syncing issues and comments from multiple providers into per-repo SQLite databases. Currently supports:
- GitHub (issues + PRs)
- JIRA (tested with jira.mongodb.org)
Then help you filter out issues you may be interested using LLM.
Each provider stores issues in data/<platform>/<owner>/<repo>.sqlite with tables:
issues(issue_id TEXT PK, number INTEGER UNIQUE, ... , metadata TEXT)comments(id INTEGER PK, comment_id INTEGER UNIQUE, issue_id, metadata TEXT)sync_state(last_issue_sync TEXT)
issue_id is a string primary key (for GitHub it's the issue/PR number string). Full raw provider JSON is stored in metadata (issue) and metadata (comment) so future fields are preserved.
- Uses provider updated timestamp to request only changed/new issues since the last successful sync.
- For any issue that is new or changed, all its comments are refreshed (simple + reliable; avoids per-comment delta logic for now).
- You can cap work per run with
--limit Nto avoid long initial syncs or stressing large providers (partial sync; resume later).
uv sync
# Then you may either call python from uv
uv run python
uv run ic --help
# Or source the venv
source .venv/bin/activateconda create -n ic python=3.11
conda activate ic
pip install -r requirements.txt--limit: you can use this to limit number of issues you scrape. This is a polite behavior in the Internet.--sortby created|updated: choose the field used for incremental ordering and cursors. Usecreatedfor the very first full ingestion (ensures you enumerate everything once). After an initial full sync finishes, switch toupdatedto skip unchanged historical issues and only process new or recently modified ones. If your initial run was interrupted, keep usingcreateduntil you are confident the historical backlog is complete.
Sync GitHub repository (issues + PRs):
Before scraping Github issues, you must set environment varialbe GITHUB_TOKEN.
ic sync --platform github --owner pygraphviz --repo pygraphviz
ic sync --platform github --owner cockroachdb --repo cockroach
ic sync --platform github --owner etcd-io --repo etcd
ic sync --platform github --owner etcd-io --repo raft
ic sync --platform github --owner RedisLabs --repo redisraftFirst, inspect available projects on the JIRA server:
ic jira_inspect --jira_base_url https://jira.mongodb.org
ic jira_inspect --jira_base_url https://issues.apache.org/jiraThen sync JIRA project:
# MongoDB JIRA examples
ic sync --platform jira --owner mongodb --repo SERVER --jira_base_url https://jira.mongodb.org
# Apache JIRA examples
ic sync --platform jira --owner apache --repo ZOOKEEPER --jira_base_url https://issues.apache.org/jiraNotes:
- For JIRA:
--owneris an organization/company marker (e.g.,mongodb,apache),--repois the JIRA project key (e.g.,SERVER,PYTHON,ZOOKEEPER) - This creates organized storage:
data/jira/mongodb/SERVER.sqlite,data/jira/apache/ZOOKEEPER.sqlite - Use
ic jira_inspectto discover available projects and their issue counts - JIRA issue keys like
SERVER-1234are mapped to a numericnumberby extracting the trailing digits; the full key is retained insidemetadataunderkey - Closed timestamp for JIRA is not currently derived;
closed_atremains null
from issueclear.db import RepoDatabase
db = RepoDatabase("github", "pygraphviz", "pygraphviz")
issues = db.get_issues_with_comments()
for issue in issues:
print(issue.issue_id, issue.title, len(issue.comments))# Run vLLM server
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2-7B-Instruct --port 8080 --quantization bitsandbytes --dtype auto
# Run query (in another terminal). After install, script is available; or use uv run.
ic query --model hosted_vllm/Qwen/Qwen2-7B-Instruct --api_base http://localhost:8080/v1 --owner pygraphviz --repo pygraphviz --query "memory leak in layout"export OPENAI_API_KEY=sk-...
# OR for Anthropic
export ANTHROPIC_API_KEY=...
ic query --model gpt-4o-mini --owner pygraphviz --repo pygraphviz --query "memory leak in layout"See LICENSE.