Generalize toolkit from ARDA-specific to general-purpose#1
Merged
Conversation
Transforms code-ingest from an ARDA-opinionated tool to a generic,
config-driven code ingestion and search system. All hard-coded ARDA
references replaced with config-driven or generic patterns.
Documentation:
- README: generic use case focus, removed ARDA project references
- All docs: replaced arda-credit, arda_code_rust with {prefix}/generic examples
- Config files: added prominent customization comments
Ingestion Pipeline:
- Fallback collections: generic names (code_rust, frontend, api_contracts)
- Fallback repos: single generic example with warning
- Config helpers: now accept config dict parameters for dynamic resolution
- Services: removed arda-credit-app path branches, use generic patterns
- Parsers: generic component detection, removed repo-specific logic
- Dependency analyzer: _is_arda_package → _is_internal_package
MCP Server:
- Docstrings: "Code Ingestion MCP" instead of "Arda Vector Database"
- Query router: generic service patterns, no hard-coded arda- repos
- Domain tools: all collections loaded from _collections_config dynamically
- GitHub utils: generalized to REPO_URL_1..N environment variables
- Resource URIs: confirmed vector:// throughout (arda:// removed from docs)
All collection and repository names now flow from config/collections.yaml
and config/repositories.yaml, with generic fallbacks when YAML is missing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Transforms the code-ingest toolkit from an ARDA-opinionated tool into a general-purpose, config-driven code ingestion and search system. All hard-coded ARDA references have been replaced with config-driven or generic patterns.
Key Changes
📚 Documentation & Examples
my-backend,myproject_code_rust,your-org--embed.modal.run)⚙️ Configuration
config/collections.yaml: Added guidance to customize prefix and collection namesconfig/repositories.yaml: Marked as example schema to replace with your repos🔄 Ingestion Pipeline
code_rust,frontend,api_contracts(noarda_prefix)determine_service_collection()anddetermine_concern_collections()now accept config parametersarda-credit-app; now use generic patterns (/api/,/frontend/)_is_arda_package()→_is_internal_package()(checks against all configured repos)🔌 MCP Server
_collections_configREPO_URL_1,REPO_URL_2, etc.Result
The toolkit now presents as a general-purpose solution that:
config/collections.yamlandconfig/repositories.yamlFiles Changed
44 files across documentation, configuration, ingestion pipeline, and MCP server.
Test Plan
make healthworks with existing ARDA configcollection_prefixinconfig/collections.yamlconfig/repositories.yaml(single repo)