Graph extractor processes chunks serially within each document, making large PDFs very slow

## Problem

During the Singapore enterprise ApeRAG import on 2026-04-29, vector and fulltext converged for 169 documents, but graph index creation was much slower. At runtime we observed graph worker concurrency at the document level only: with 2 API pods and graph worker concurrency=4 per pod, there were about 7 RUNNING graph documents. However, ACTIVE increased very slowly because each large document processes its chunks sequentially.

Code path:
- `aperag/indexing/orchestrator.py`: `run_graph_worker = _entrypoint(Modality.GRAPH, concurrency=4)` controls document-level worker concurrency.
- `aperag/indexing/graph_extractor.py`: `_extractor(chunks)` loops `for chunk in chunks` and awaits `_extract_one_chunk(...)` before moving to the next chunk.
- Default per-chunk timeout is 60s, so large PDFs with many chunks can occupy one graph worker for a long time.

## Impact

- CPU can stay low while graph completion is slow because work is mostly remote LLM I/O.
- Scaling API pods increases document-level graph worker count, but does not reduce latency for one large document.
- In deployments where API and workers are in the same process, long graph jobs also increase API tail latency/readiness risk.

## Suggested fix

Add bounded per-document chunk-level concurrency for graph extraction, configurable in `collection.config.knowledge_graph_config` or deployment config. For example:
- `chunk_concurrency` default small (e.g. 4) with upper bound.
- Use an `asyncio.Semaphore` around per-chunk LLM calls and `asyncio.gather(..., return_exceptions=True)` while preserving per-chunk failure isolation.
- Keep global/document-level graph worker concurrency separate from chunk-level concurrency to avoid overloading the model provider.
- Add tests that prove multiple chunks can be in flight and that one failed chunk does not fail the whole document.

## Related operational finding

This is separate from ApeRAG#1866, where graph compaction logs non-fatal warnings due missing keyword arguments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph extractor processes chunks serially within each document, making large PDFs very slow #1867

Problem

Impact

Suggested fix

Related operational finding

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Graph extractor processes chunks serially within each document, making large PDFs very slow #1867

Description

Problem

Impact

Suggested fix

Related operational finding

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions