label_communities ignores _resolve_max_tokens() — hardcoded token budget truncates label responses for graphs with 50+ communities

## Summary

`label_communities` (llm.py:1855) uses a hardcoded `max_tokens` formula that ignores the existing `_resolve_max_tokens()` helper (llm.py:232), making `GRAPHIFY_MAX_OUTPUT_TOKENS` a dead env var for the labeling path. On graphs with many communities, the formula undershoots and the LLM response is truncated, producing unparseable JSON.

## How we hit it

1. `graphify extract . --backend deepseek` on a Rust project (~215 files) → 1650 nodes, 3596 edges
2. Clustering produced **92 communities**
3. `graphify label . --backend deepseek` failed with:
   ```
   [graphify label] warning: community labeling failed (Unterminated string starting at: line 7 column 8 (char 186)); using Community N placeholders.
   ```

## Root cause

`label_communities` at **line 1855** calculates its own token budget:

```python
max_tokens = min(40 + 16 * len(labeled_cids), 4096)
# 92 communities → min(40 + 1472, 4096) = 1512 tokens
```

The prompt instructs the model to output a JSON object mapping 92 community IDs to 2-5 word labels. At ~15-20 tokens per entry (JSON key + quoted string + comma), the response needs roughly 1,400-1,800 tokens. At **1512**, the model runs out of tokens mid-response, truncating the JSON string and producing an unparseable payload.

Meanwhile, `_resolve_max_tokens()` exists at **line 232** specifically to let users override token budgets via `GRAPHIFY_MAX_OUTPUT_TOKENS`, and the main extraction path uses it (lines 1121, 1152). But `label_communities` never calls it — the env var is silently ignored for labeling.

## Workaround

Patched line 1855 locally to increase the per-community token allocation:

```python
# Original:
max_tokens = min(40 + 16 * len(labeled_cids), 4096)

# Patched:
max_tokens = min(80 + 45 * len(labeled_cids), 4096)
```

This gives the model enough headroom (4096 tokens for 92 communities) to complete the full JSON response without truncation. After applying this patch, `graphify label . --backend deepseek` succeeded and produced names for all 92 communities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

label_communities ignores _resolve_max_tokens() — hardcoded token budget truncates label responses for graphs with 50+ communities #1200

Summary

How we hit it

Root cause

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

label_communities ignores _resolve_max_tokens() — hardcoded token budget truncates label responses for graphs with 50+ communities #1200

Description

Summary

How we hit it

Root cause

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions