-
Notifications
You must be signed in to change notification settings - Fork 34
Add tree cache to ASTAnalyzer and optimize vector projection #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Introduces a tree cache to ASTAnalyzer to avoid re-parsing unchanged files, improving performance for repeated analysis. Enhances vector projection in scripts/ingest/vectors.py by using numpy when available for significant speedup, with a fallback to pure Python. Updates Qdrant upsert logic to support async/sync modes via parameter or environment variable, and adds a flush_upserts utility to ensure data persistence after async upserts.
🤖 Augment PR SummarySummary: Performance-focused update to reuse parsed ASTs, speed up mini-vector projection, and optionally run Qdrant upserts asynchronously. Changes:
Technical Notes: Default behavior remains synchronous unless async mode is enabled; projection matrices are cached and NumPy uses vectorized matmul + L2 normalization. 🤖 Was this summary useful? React with 👍 or 👎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts/ast_analyzer.py
Outdated
| path = Path(file_path) if file_path else None | ||
|
|
||
| # Try to get cached tree (only for real files, not in-memory content) | ||
| if self._tree_cache and path and path.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return | ||
| try: | ||
| # Force a sync operation to ensure all pending writes are flushed | ||
| client.get_collection(collection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [scale * (1.0 if rnd.random() < 0.5 else -1.0) for _ in range(out_dim)] | ||
| for _ in range(in_dim) | ||
| ] | ||
| M = np.array(M_list, dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refines ASTAnalyzer to avoid returning stale trees when content is provided in-memory, by tracking content provenance and skipping cache in such cases. Enhances flush_upserts in Qdrant ingest to clarify consistency semantics and add a minimal scroll for better write visibility. Adds comprehensive tests for chunk deduplication, ingest infrastructure, and tree cache integration to ensure correctness and robustness.
Introduces a tree cache to ASTAnalyzer to avoid re-parsing unchanged files, improving performance for repeated analysis. Enhances vector projection in scripts/ingest/vectors.py by using numpy when available for significant speedup, with a fallback to pure Python. Updates Qdrant upsert logic to support async/sync modes via parameter or environment variable, and adds a flush_upserts utility to ensure data persistence after async upserts.