Skip to content

Conversation

@m1rl0k
Copy link
Collaborator

@m1rl0k m1rl0k commented Jan 24, 2026

Introduces a tree cache to ASTAnalyzer to avoid re-parsing unchanged files, improving performance for repeated analysis. Enhances vector projection in scripts/ingest/vectors.py by using numpy when available for significant speedup, with a fallback to pure Python. Updates Qdrant upsert logic to support async/sync modes via parameter or environment variable, and adds a flush_upserts utility to ensure data persistence after async upserts.

Introduces a tree cache to ASTAnalyzer to avoid re-parsing unchanged files, improving performance for repeated analysis. Enhances vector projection in scripts/ingest/vectors.py by using numpy when available for significant speedup, with a fallback to pure Python. Updates Qdrant upsert logic to support async/sync modes via parameter or environment variable, and adds a flush_upserts utility to ensure data persistence after async upserts.
@augmentcode
Copy link

augmentcode bot commented Jan 24, 2026

🤖 Augment PR Summary

Summary: Performance-focused update to reuse parsed ASTs, speed up mini-vector projection, and optionally run Qdrant upserts asynchronously.

Changes:

  • Added an optional tree cache to ASTAnalyzer to avoid re-parsing unchanged files.
  • Routed mapping-based tree-sitter analysis through a new _parse_with_cache helper and exposed cache stats.
  • Optimized project_mini random projection with a NumPy fast-path and a pure-Python fallback.
  • Extended Qdrant upsert_points to support sync/async behavior via a wait parameter and INDEX_UPSERT_ASYNC.
  • Added flush_upserts utility intended to help ensure persistence after async upserts.

Technical Notes: Default behavior remains synchronous unless async mode is enabled; projection matrices are cached and NumPy uses vectorized matmul + L2 normalization.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

path = Path(file_path) if file_path else None

# Try to get cached tree (only for real files, not in-memory content)
if self._tree_cache and path and path.exists():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_parse_with_cache can return a cached tree even when analyze_file(..., content=...) is called with explicit content for an on-disk file_path, which risks analyzing a stale tree if the provided content differs from the file on disk.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

return
try:
# Force a sync operation to ensure all pending writes are flushed
client.get_collection(collection)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush_upserts() calls get_collection(), but Qdrant’s wait=False semantics are “confirmed received” rather than “applied”, so this may not actually guarantee prior async upserts are committed before reads.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

[scale * (1.0 if rnd.random() < 0.5 else -1.0) for _ in range(out_dim)]
for _ in range(in_dim)
]
M = np.array(M_list, dtype=np.float32)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NumPy path uses float32 for both the projection matrix and input vector, so results will differ from the pure-Python (float64) path despite the “reproducibility” comment; if determinism across environments matters, this could be surprising.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Refines ASTAnalyzer to avoid returning stale trees when content is provided in-memory, by tracking content provenance and skipping cache in such cases. Enhances flush_upserts in Qdrant ingest to clarify consistency semantics and add a minimal scroll for better write visibility. Adds comprehensive tests for chunk deduplication, ingest infrastructure, and tree cache integration to ensure correctness and robustness.
@m1rl0k m1rl0k merged commit 85de0dd into test Jan 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants