IssueClear

This is essentially a tool for issue scraping and LLM-based analysis.

Support incrementally syncing issues and comments from multiple providers into per-repo SQLite databases. Currently supports:

GitHub (issues + PRs)
JIRA (tested with jira.mongodb.org)

Then help you filter out issues you may be interested using LLM.

Design

Issue Data Storage

Each provider stores issues in data/<platform>/<owner>/<repo>.sqlite with tables:

issues(issue_id TEXT PK, number INTEGER UNIQUE, ... , metadata TEXT)
comments(id INTEGER PK, comment_id INTEGER UNIQUE, issue_id, metadata TEXT)
sync_state(last_issue_sync TEXT)

issue_id is a string primary key (for GitHub it's the issue/PR number string). Full raw provider JSON is stored in metadata (issue) and metadata (comment) so future fields are preserved.

Incremental Sync Strategy

Uses provider updated timestamp to request only changed/new issues since the last successful sync.
For any issue that is new or changed, all its comments are refreshed (simple + reliable; avoids per-comment delta logic for now).
You can cap work per run with --limit N to avoid long initial syncs or stressing large providers (partial sync; resume later).

Setup

Option A: uv (Recommended)

uv sync
# Then you may either call python from uv
uv run python
uv run ic --help
# Or source the venv
source .venv/bin/activate

Option B: Conda / pip

conda create -n ic python=3.11
conda activate ic
pip install -r requirements.txt

Usage

Scraping CLI Usage

General Args

--limit: you can use this to limit number of issues you scrape. This is a polite behavior in the Internet.
--sortby created|updated: choose the field used for incremental ordering and cursors. Use created for the very first full ingestion (ensures you enumerate everything once). After an initial full sync finishes, switch to updated to skip unchanged historical issues and only process new or recently modified ones. If your initial run was interrupted, keep using created until you are confident the historical backlog is complete.

Github

Sync GitHub repository (issues + PRs):

Before scraping Github issues, you must set environment varialbe GITHUB_TOKEN.

ic sync --platform github --owner pygraphviz --repo pygraphviz
ic sync --platform github --owner cockroachdb --repo cockroach
ic sync --platform github --owner etcd-io --repo etcd
ic sync --platform github --owner etcd-io --repo raft
ic sync --platform github --owner RedisLabs --repo redisraft

JIRA

First, inspect available projects on the JIRA server:

ic jira_inspect --jira_base_url https://jira.mongodb.org
ic jira_inspect --jira_base_url https://issues.apache.org/jira

Then sync JIRA project:

# MongoDB JIRA examples
ic sync --platform jira --owner mongodb --repo SERVER --jira_base_url https://jira.mongodb.org

# Apache JIRA examples  
ic sync --platform jira --owner apache --repo ZOOKEEPER --jira_base_url https://issues.apache.org/jira

Notes:

For JIRA: --owner is an organization/company marker (e.g., mongodb, apache), --repo is the JIRA project key (e.g., SERVER, PYTHON, ZOOKEEPER)
This creates organized storage: data/jira/mongodb/SERVER.sqlite, data/jira/apache/ZOOKEEPER.sqlite
Use ic jira_inspect to discover available projects and their issue counts
JIRA issue keys like SERVER-1234 are mapped to a numeric number by extracting the trailing digits; the full key is retained inside metadata under key
Closed timestamp for JIRA is not currently derived; closed_at remains null

Inspecting Database

from issueclear.db import RepoDatabase

db = RepoDatabase("github", "pygraphviz", "pygraphviz")
issues = db.get_issues_with_comments()
for issue in issues:
	print(issue.issue_id, issue.title, len(issue.comments))

LLM Query

Local Model (vllm example)

# Run vLLM server
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2-7B-Instruct --port 8080 --quantization bitsandbytes --dtype auto

# Run query (in another terminal). After install, script is available; or use uv run.
ic query --model hosted_vllm/Qwen/Qwen2-7B-Instruct --api_base http://localhost:8080/v1 --owner pygraphviz --repo pygraphviz --query "memory leak in layout"

Online API

export OPENAI_API_KEY=sk-...
# OR for Anthropic
export ANTHROPIC_API_KEY=...
ic query --model gpt-4o-mini --owner pygraphviz --repo pygraphviz --query "memory leak in layout"

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
issueclear		issueclear
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

IssueClear

Design

Issue Data Storage

Incremental Sync Strategy

Setup

Option A: uv (Recommended)

Option B: Conda / pip

Usage

Scraping CLI Usage

General Args

Github

JIRA

Inspecting Database

LLM Query

Local Model (vllm example)

Online API

License

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

RabbitWhite1/issueclear

Folders and files

Latest commit

History

Repository files navigation

IssueClear

Design

Issue Data Storage

Incremental Sync Strategy

Setup

Option A: uv (Recommended)

Option B: Conda / pip

Usage

Scraping CLI Usage

General Args

Github

JIRA

Inspecting Database

LLM Query

Local Model (vllm example)

Online API

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages