feat: Add Unstract Adapter Extension Skill and Azure AI Foundry Adapter #1721

hari-kuriakose · 2025-12-23T05:41:27Z

What

Add new Claude Code skill for automating adapter development workflows (adapter-ops)
Add Azure AI Foundry LLM adapter with full configuration schema
Update existing adapter JSON schemas (OpenAI, Azure, Bedrock, Mistral, Ollama) with improved parameter specifications

Why

Developer Productivity: Reduce time to create new adapters from hours to minutes with automated scaffolding scripts
Consistency: Ensure all new adapters follow the same patterns and conventions
Azure AI Foundry Support: Enable users to connect to Azure AI Foundry models via LiteLLM
Parameter Accuracy: Improve adapter configurations with correct LiteLLM prefixes and parameter types

How

Skill Structure:
- SKILL.md: Comprehensive documentation with workflows and quick reference
- scripts/init_llm_adapter.py: Automated LLM adapter creation with logo support
- scripts/init_embedding_adapter.py: Automated embedding adapter creation
- scripts/manage_models.py: Add/remove models from existing adapters
- scripts/check_adapter_updates.py: Compare adapters against LiteLLM features
- templates/: Jinja2 templates for adapter files
- references/: Pattern documentation and JSON schema guide
Azure AI Foundry Adapter:
- New adapter class in llm1/azure_ai_foundry.py
- JSON schema in llm1/static/azure_ai_foundry.json
- Parameter class AzureAIFoundryLLMParameters in base1.py
- Icon in frontend/public/icons/adapter-icons/AzureAIFoundry.png
Existing Adapter Updates:
- Updated model prefixes for LiteLLM compatibility
- Added missing parameters (temperature, max_tokens, etc.)
- Improved JSON schema validation patterns

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No breaking changes expected:

The skill files are additive (new .claude/skills/ directory)
Azure AI Foundry is a new adapter, doesn't modify existing adapter behavior
JSON schema updates add new fields but maintain backward compatibility with existing configurations
Parameter class updates add validation but use safe defaults

Database Migrations

None required

Env Config

None required

Relevant Docs

Related Issues or PRs

N/A

Dependencies Versions

No new dependencies added
Skill scripts use standard library (pathlib, json, uuid, argparse)

Notes on Testing

Skill scripts tested locally for LLM and embedding adapter creation
Azure AI Foundry adapter validated against LiteLLM completion API
JSON schema changes validated in Unstract UI

Screenshots

N/A (backend changes only)

Checklist

I have read and understood the Contribution Guidelines.

* feat: add azure ai foundry adapter * feat: update parameters in various adapters

coderabbitai · 2025-12-23T05:41:40Z

Summary by CodeRabbit

New Features
- Added Azure AI Foundry LLM adapter support
- Mistral LLM: optional reasoning toggle with configurable effort (low/medium/high)
- Ollama LLM: JSON mode for structured output
- AWS Bedrock: optional profile-based auth and model_id support
- Embeddings: optional custom dimensions
- Added adapter scaffolding, templates, and management scripts to create/update adapters
Documentation
- Added comprehensive adapter extension guides, templates, and capability references

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Adds optional LLM/embedding parameters, a new Azure AI Foundry LLM adapter and schema, reasoning/json_mode/dimensions support in parameter validation and JSON schemas, multiple adapter generator templates and scaffolding scripts, plus documentation and tooling for adapter lifecycle and model/schema management.

Changes

Cohort / File(s)	Summary
Base parameter changes `unstract/sdk1/src/unstract/sdk1/adapters/base1.py`	Added optional fields: `aws_profile_name`, `model_id` (AWSBedrock), `reasoning_effort` (Mistral), `json_mode` (Ollama), `dimensions` (OpenAIEmbedding, AzureOpenAIEmbedding). Added `AzureAIFoundryLLMParameters` class. Enhanced validate() logic across multiple parameter classes to handle reasoning/thinking toggles, json_mode → response_format mapping, and temperature adjustments.
Azure AI Foundry adapter & schema `unstract/sdk1/src/unstract/sdk1/adapters/llm1/azure_ai_foundry.py`, `.../llm1/static/azure_ai_foundry.json`	New Azure AI Foundry LLM adapter file (static metadata methods) and corresponding JSON schema (requires `adapter_name`, `api_key`, `api_base`; optional model/max_tokens/etc.).
LLM static schemas — Bedrock / Mistral / Ollama `unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/bedrock.json`, `.../mistral.json`, `.../ollama.json`	Bedrock: added `aws_profile_name`, `model_id`; made AWS keys optional and updated descriptions/titles. Mistral: default model updated, added `enable_reasoning` toggle and conditional `reasoning_effort` enum via allOf. Ollama: added `max_tokens`, `temperature`, `json_mode`; model examples expanded, minor title formatting.
Embedding schemas — OpenAI & Azure `unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/openai.json`, `.../embedding1/static/azure.json`	Added `dimensions` numeric property and updated model defaults/descriptions (noting support for text-embedding-3-* models).
Adapter templates `.claude/skills/adapter-ops/assets/templates/llm_adapter.py.template`, `.../llm_parameters.py.template`, `.../embedding_*`	New template files for LLM and embedding adapter scaffolds (adapter class, parameter class, JSON schema templates) with placeholder-driven code and validate() examples.
Adapter scaffolding & management scripts `.claude/skills/adapter-ops/scripts/init_llm_adapter.py`, `.claude/skills/adapter-ops/scripts/init_embedding_adapter.py`, `.claude/skills/adapter-ops/scripts/check_adapter_updates.py`, `.claude/skills/adapter-ops/scripts/manage_models.py`	New CLI scripts to scaffold adapters (with logo handling), check adapter schemas against LiteLLM features, and manage model enums/defaults/descriptions in adapter schemas. Export multiple helper functions and CLI entry points.
Documentation & references `.claude/skills/adapter-ops/SKILL.md`, `.claude/skills/adapter-ops/references/*.md`	New comprehensive docs: adapter extension skill guide, adapter patterns, JSON Schema guide, provider capabilities matrix, and examples for reasoning/thinking/json_mode/dimensions.
Templates & assets `.claude/skills/adapter-ops/assets/templates/*`	Added embedding/llm adapter and parameter templates and schema templates used by scaffolding scripts.
Misc / Git `.gitignore`	Removed ignore entries for `CLAUDE.md` and `.claude/*` so adapter-ops docs/scripts are tracked.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.60% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly summarizes the main changes: adding an Adapter Extension Skill and Azure AI Foundry Adapter.
Description check	✅ Passed	The PR description comprehensively addresses all required template sections with detailed information, clear explanations, and explicit confirmation of no breaking changes.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/adater-ops-skill

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (13)

.claude/skills/unstract-adapter-extension/assets/templates/llm_schema.json.template (1)
34-40: Consider using "integer" type instead of "number" with multipleOf: 1.

For integer fields like max_tokens, using "type": "integer" is more idiomatic than "type": "number" with "multipleOf": 1. This applies to max_tokens (line 35), max_retries (line 42), and timeout (line 50) as well.
🔎 Proposed refactor
     "max_tokens": {
-      "type": "number",
+      "type": "integer",
       "minimum": 0,
-      "multipleOf": 1,
       "title": "Maximum Output Tokens",
       "description": "Maximum number of output tokens to limit LLM replies."
     },
Apply the same pattern to max_retries and timeout fields.
.claude/skills/unstract-adapter-extension/scripts/init_embedding_adapter.py (3)
309-342: Consider more specific exception handling for image processing.

Line 339 catches a broad Exception. While the function handles this gracefully with a print statement, consider catching specific PIL exceptions (e.g., PIL.UnidentifiedImageError, OSError) for better error diagnosis.
🔎 Suggested improvement
     except ImportError:
         if image_data[:8] == b"\x89PNG\r\n\x1a\n":
             output_path.write_bytes(image_data)
             return True
         print("  Note: Install Pillow for better image processing: pip install Pillow")
         return False
-    except Exception as e:
+    except (OSError, ValueError) as e:
         print(f"  Error processing image: {e}")
         return False
355-415: LGTM with minor suggestion: Local logo processing mirrors download logic.

The function appropriately handles SVG conversion, raster resizing, and missing dependencies. Same suggestion as download_and_process_logo: consider more specific exception handling at line 413.

478-490: Consider using copy.deepcopy for schema template cloning.

Line 478 uses json.loads(json.dumps(EMBEDDING_SCHEMA_TEMPLATE)) to clone the template dict. While this ensures JSON serializability, copy.deepcopy() would be more idiomatic and efficient.
🔎 Suggested refactor

Add import:
import copy
Then replace:
-    schema = json.loads(json.dumps(EMBEDDING_SCHEMA_TEMPLATE))
+    schema = copy.deepcopy(EMBEDDING_SCHEMA_TEMPLATE)
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/openai.json (1)
33-39: Consider using "integer" type for dimensions field.

Similar to the template file, this field uses "type": "number" with "multipleOf": 1 to represent integers. Using "type": "integer" would be more idiomatic.
🔎 Proposed refactor
     "dimensions": {
-      "type": "number",
+      "type": "integer",
       "minimum": 1,
-      "multipleOf": 1,
       "title": "Dimensions",
       "description": "Output embedding dimensions. Only supported by text-embedding-3-* models. Leave empty for default dimensions (1536 for small, 3072 for large)."
     },
.claude/skills/unstract-adapter-extension/assets/templates/embedding_schema.json.template (1)
35-42: Consider using "integer" type for numeric fields.

Both embed_batch_size and timeout use "type": "number" with "multipleOf": 1. Consider using "type": "integer" for clarity, consistent with the suggestion for the LLM template.
🔎 Proposed refactor
     "embed_batch_size": {
-      "type": "number",
+      "type": "integer",
       "minimum": 1,
-      "multipleOf": 1,
       "title": "Embed Batch Size",
       "default": 10,
       "description": "Number of texts to embed per batch."
     },
     "timeout": {
-      "type": "number",
+      "type": "integer",
       "minimum": 0,
-      "multipleOf": 1,
       "title": "Timeout",
       "default": 240,
       "description": "Request timeout in seconds."
     }
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/azure.json (1)

23-23: LGTM: Azure embedding schema updated with dimensions support.

The model description update and new dimensions property align well with the OpenAI schema changes. The note about text-embedding-3-* model support is helpful.

Same optional refactor suggestion applies: consider using "type": "integer" instead of "type": "number" with "multipleOf": 1 for the dimensions field.

Also applies to: 50-56
.claude/skills/unstract-adapter-extension/references/provider_capabilities.md (1)
45-90: Optional: Add language specifiers to fenced code blocks.

The parameter specification blocks (lines 45-90) lack language specifiers in their fenced code blocks. While this doesn't affect functionality, adding a specifier like text or yaml would satisfy linting tools.

Example:
 ### OpenAI LLM
-```
+```text
 api_key, api_base, model, max_tokens, temperature, top_p
 enable_reasoning, reasoning_effort (o1/o3 models)
 ```
.claude/skills/unstract-adapter-extension/references/adapter_patterns.md (1)
7-25: Consider adding a language identifier to the code block.

The fenced code block for the architecture overview lacks a language specification. While this is a text diagram, adding a language identifier improves markdown rendering consistency.
Suggested fix
-```
+```text
 unstract/sdk1/src/unstract/sdk1/adapters/
 ├── base1.py              # Parameter classes & base adapter
.claude/skills/unstract-adapter-extension/scripts/manage_models.py (2)
40-43: Consider adding error handling for malformed JSON.

load_schema doesn't handle json.JSONDecodeError. If a schema file is corrupted, the script will crash with an unhelpful traceback.
🔎 Proposed fix
 def load_schema(schema_path: Path) -> dict:
     """Load and parse a JSON schema file."""
-    with open(schema_path) as f:
-        return json.load(f)
+    try:
+        with open(schema_path) as f:
+            return json.load(f)
+    except json.JSONDecodeError as e:
+        print(f"Error: Invalid JSON in {schema_path}: {e}")
+        sys.exit(1)
115-123: Setting default to a model not in enum could cause UI validation issues.

set_default_model allows setting a default that may not exist in the enum list. If the schema has an enum constraint, setting a default outside that enum could cause validation errors in the UI.
🔎 Proposed improvement
 def set_default_model(schema: dict, model: str) -> dict:
     """Set the default model."""
     if "properties" not in schema:
         schema["properties"] = {}
     if "model" not in schema["properties"]:
         schema["properties"]["model"] = {"type": "string", "title": "Model"}

+    # Warn if setting default to a value not in enum
+    existing_enum = schema["properties"]["model"].get("enum", [])
+    if existing_enum and model not in existing_enum:
+        print(f"Warning: '{model}' is not in the enum list. Adding it.")
+        existing_enum.append(model)
+        schema["properties"]["model"]["enum"] = existing_enum
+
     schema["properties"]["model"]["default"] = model
     return schema
.claude/skills/unstract-adapter-extension/scripts/init_llm_adapter.py (2)
345-347: Avoid catching bare Exception - narrows exception handling.

Catching all exceptions hides specific errors and makes debugging harder. Consider catching specific exceptions like OSError or PIL.UnidentifiedImageError.
🔎 Proposed fix
-    except Exception as e:
-        print(f"  Error processing image: {e}")
-        return False
+    except (OSError, ValueError) as e:
+        print(f"  Error processing image: {e}")
+        return False
464-466: Reconsider default parameter class choice.

Defaulting to AnyscaleLLMParameters when no parameter class is specified seems arbitrary. Consider defaulting to a more generic base class or requiring explicit specification.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 09f62be and 182464b.

⛔ Files ignored due to path filters (1)

frontend/public/icons/adapter-icons/AzureAIFoundry.png is excluded by !**/*.png

📒 Files selected for processing (22)

.claude/skills/unstract-adapter-extension/SKILL.md
.claude/skills/unstract-adapter-extension/assets/templates/embedding_adapter.py.template
.claude/skills/unstract-adapter-extension/assets/templates/embedding_parameters.py.template
.claude/skills/unstract-adapter-extension/assets/templates/embedding_schema.json.template
.claude/skills/unstract-adapter-extension/assets/templates/llm_adapter.py.template
.claude/skills/unstract-adapter-extension/assets/templates/llm_parameters.py.template
.claude/skills/unstract-adapter-extension/assets/templates/llm_schema.json.template
.claude/skills/unstract-adapter-extension/references/adapter_patterns.md
.claude/skills/unstract-adapter-extension/references/json_schema_guide.md
.claude/skills/unstract-adapter-extension/references/provider_capabilities.md
.claude/skills/unstract-adapter-extension/scripts/check_adapter_updates.py
.claude/skills/unstract-adapter-extension/scripts/init_embedding_adapter.py
.claude/skills/unstract-adapter-extension/scripts/init_llm_adapter.py
.claude/skills/unstract-adapter-extension/scripts/manage_models.py
unstract/sdk1/src/unstract/sdk1/adapters/base1.py
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/azure.json
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/openai.json
unstract/sdk1/src/unstract/sdk1/adapters/llm1/azure_ai_foundry.py
unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/azure_ai_foundry.json
unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/bedrock.json
unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/mistral.json
unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/ollama.json

🧰 Additional context used

🧬 Code graph analysis (3)

unstract/sdk1/src/unstract/sdk1/adapters/llm1/azure_ai_foundry.py (2)

unstract/sdk1/src/unstract/sdk1/adapters/base1.py (8)

AzureAIFoundryLLMParameters (691-715)

BaseAdapter (76-117)

get_id (81-82)

get_name (86-87)

get_description (91-92)

get_provider (96-97)

get_icon (101-102)

get_adapter_type (116-117)

unstract/sdk1/src/unstract/sdk1/constants.py (1)

AdapterTypes (16-22)

.claude/skills/unstract-adapter-extension/scripts/init_embedding_adapter.py (1)

.claude/skills/unstract-adapter-extension/scripts/init_llm_adapter.py (7)

to_class_name (148-166)

to_icon_name (169-174)

fetch_url (177-187)

search_potential_logo_sources (190-246)

download_and_process_logo (249-347)

copy_logo (350-420)

main (542-649)

.claude/skills/unstract-adapter-extension/scripts/init_llm_adapter.py (1)

.claude/skills/unstract-adapter-extension/scripts/init_embedding_adapter.py (1)

to_class_name (147-163)

🪛 LanguageTool

.claude/skills/unstract-adapter-extension/SKILL.md

[grammar] ~6-~6: Ensure spelling is correct
Context: ...and self-hosted models (Ollama). --- # Unstract Adapter Extension Skill This skill pro...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)

.claude/skills/unstract-adapter-extension/references/provider_capabilities.md

45-45: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

57-57: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

63-63: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

70-70: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

76-76: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

82-82: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

88-88: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

138-138: Bare URL used

(MD034, no-bare-urls)

139-139: Bare URL used

(MD034, no-bare-urls)

140-140: Bare URL used

(MD034, no-bare-urls)

141-141: Bare URL used

(MD034, no-bare-urls)

142-142: Bare URL used

(MD034, no-bare-urls)

143-143: Bare URL used

(MD034, no-bare-urls)

144-144: Bare URL used

(MD034, no-bare-urls)

145-145: Bare URL used

(MD034, no-bare-urls)

.claude/skills/unstract-adapter-extension/references/adapter_patterns.md

24-24: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

42-42: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

301-301: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.10)

.claude/skills/unstract-adapter-extension/scripts/init_embedding_adapter.py

177-177: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

178-178: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

209-209: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

210-210: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

229-229: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

230-230: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

280-280: subprocess call: check for execution of untrusted input

(S603)

281-293: Starting a process with a partial executable path

(S607)

300-300: Consider moving this statement to an else block

(TRY300)

331-331: Consider moving this statement to an else block

(TRY300)

339-339: Do not catch blind exception: Exception

(BLE001)

360-360: subprocess call: check for execution of untrusted input

(S603)

361-373: Starting a process with a partial executable path

(S607)

380-380: Consider moving this statement to an else block

(TRY300)

406-406: Consider moving this statement to an else block

(TRY300)

413-413: Do not catch blind exception: Exception

(BLE001)

.claude/skills/unstract-adapter-extension/scripts/init_llm_adapter.py

183-183: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

184-184: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

215-215: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

216-216: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

235-235: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

236-236: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

286-286: subprocess call: check for execution of untrusted input

(S603)

287-299: Starting a process with a partial executable path

(S607)

306-306: Consider moving this statement to an else block

(TRY300)

337-337: Consider moving this statement to an else block

(TRY300)

345-345: Do not catch blind exception: Exception

(BLE001)

366-366: subprocess call: check for execution of untrusted input

(S603)

367-379: Starting a process with a partial executable path

(S607)

386-386: Consider moving this statement to an else block

(TRY300)

412-412: Consider moving this statement to an else block

(TRY300)

419-419: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (30)

.claude/skills/unstract-adapter-extension/scripts/init_embedding_adapter.py (6)

147-169: LGTM: String conversion utilities are well-implemented.

The special-case handling for known providers and the fallback logic are appropriate.

171-182: LGTM: URL fetching is properly secured.

The function includes appropriate error handling, timeout protection, and a User-Agent header. The static analysis warning (S310) is a false positive for this legitimate logo-fetching use case.

184-241: LGTM: Logo search functionality is well-designed.

The separation of search from download, use of HEAD requests, and appropriate error handling are all good practices.

269-308: LGTM: SVG conversion with ImageMagick is properly secured.

The subprocess call uses a list of arguments (not shell=True), preventing command injection. Input paths are controlled, and missing dependency handling is appropriate.

510-522: Excellent design: Auto-logo search doesn't auto-download.

The decision to search for logos but require explicit user confirmation (via --logo-url) before downloading is a good security and user-experience practice. This prevents unintended downloads of incorrect logos.

532-643: LGTM: CLI interface is well-designed.

The argument parsing, dry-run support, error reporting, and next-steps guidance provide a great user experience. Exit code handling is correct.

unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/openai.json (1)

18-19: LGTM: Default model updated to newer version.

Updating the default from text-embedding-ada-002 to text-embedding-3-small aligns with current OpenAI recommendations.

unstract/sdk1/src/unstract/sdk1/adapters/llm1/azure_ai_foundry.py (1)

1-40: LGTM: Azure AI Foundry adapter correctly implements all required methods.

The adapter class properly:

Inherits from both AzureAIFoundryLLMParameters and BaseAdapter

Implements all required static methods with appropriate decorators

Uses correct ID format (provider|uuid)

Returns proper metadata structure

Follows naming conventions

.claude/skills/unstract-adapter-extension/references/provider_capabilities.md (1)

1-145: LGTM: Comprehensive provider capabilities reference.

The documentation provides a well-organized reference for provider features, parameters, authentication methods, and model prefixes. The tables are clear and the feature definitions are helpful.

.claude/skills/unstract-adapter-extension/SKILL.md (1)

1-353: LGTM: Comprehensive and well-structured skill documentation.

This documentation provides excellent coverage of:

Supported operations with clear command examples

Detailed workflows for each operation type

Validation checklists to prevent common errors

Maintenance procedures for keeping adapters current

Troubleshooting guidance

The inclusion of practical tips (like the GitHub raw URL guidance at lines 76-78) and the emphasis on logo verification (auto-search doesn't auto-download) demonstrate thoughtful design.
.claude/skills/unstract-adapter-extension/assets/templates/llm_schema.json.template (1)
29-29: Use standard JSON Schema format "uri" instead of non-standard "url".

The JSON Schema specification defines standard formats for resource identifiers as "uuid, uri, uri-reference, iri, iri-reference". "url" is not a recognized format. Use "uri" for API base URLs to ensure schema compliance.
-      "format": "url",
+      "format": "uri",
unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/azure_ai_foundry.json (1)

1-59: LGTM!

The JSON schema is well-structured and follows the established patterns from other adapter schemas in the codebase. The required fields (adapter_name, api_key, api_base) are appropriate for Azure AI Foundry, and the use of format: "password" for the API key and format: "uri" for the endpoint URL provides proper UI hints. The integer-like fields correctly use "type": "number" with "multipleOf": 1 for consistency with other adapter schemas.

.claude/skills/unstract-adapter-extension/references/json_schema_guide.md (1)

1-549: LGTM!

This is a comprehensive and well-organized reference guide for creating adapter JSON schemas. The documentation covers all essential field types, provides clear examples for conditional fields using allOf/if/then, and includes practical complete examples for various adapter types (simple LLM, cloud provider, Azure-style, self-hosted, embedding). The best practices section offers valuable guidance for schema authors.

.claude/skills/unstract-adapter-extension/references/adapter_patterns.md (1)

36-562: LGTM!

The adapter patterns reference is comprehensive and well-structured. It covers the complete LLM adapter implementation with parameter classes, adapter classes, and JSON schemas. The patterns for reasoning, thinking, conditional fields, and optional credentials are clearly documented with practical code examples. The "Common Mistakes" section is particularly valuable for developers new to adapter development.

.claude/skills/unstract-adapter-extension/assets/templates/llm_parameters.py.template (1)

1-61: LGTM!

The LLM parameters template is well-designed and follows the documented patterns. The validate_model method correctly implements idempotent prefix handling to avoid double-prefixing. The template provides clear placeholder documentation and appropriate comments for customization points. The structure aligns with the patterns documented in adapter_patterns.md.

.claude/skills/unstract-adapter-extension/assets/templates/embedding_parameters.py.template (1)

1-57: LGTM!

The embedding parameters template correctly differentiates from the LLM template by not applying a model prefix (as noted in the docstring, embedding models typically don't need prefixes). The embed_batch_size field with a default of 10 is appropriate for embedding adapters. The template structure is consistent with the LLM parameters template and follows established patterns.

unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/bedrock.json (1)

22-48: LGTM!

The schema updates appropriately support multiple AWS authentication methods. The updated descriptions for aws_access_key_id and aws_secret_access_key clarify that these can be left empty when using AWS Profile or IAM role authentication. The new aws_profile_name field enables AWS SSO authentication, and the model_id field for Application Inference Profile ARN supports cost tracking. This aligns with the patterns documented in the "Optional Credentials Pattern" section of the adapter patterns reference.

.claude/skills/unstract-adapter-extension/assets/templates/embedding_adapter.py.template (1)

1-50: LGTM!

The embedding adapter template is well-structured and follows the documented patterns. The inheritance order (${PARAM_CLASS} first, then BaseAdapter) is correct as noted in the adapter patterns documentation. All required static methods are present with appropriate return types and the metadata structure matches the expected format for adapter registration.

Also applies to: 55-59

unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/mistral.json (1)

16-99: LGTM!

The schema correctly implements support for Magistral Small 2506 and Magistral Medium models, which are Mistral's first reasoning models released in two variants: a 24B parameter open-source version and a more powerful enterprise version. The conditional allOf block properly gates the reasoning_effort field, showing it only when enable_reasoning is true—an appropriate pattern for these models' reasoning capabilities. The updated model default to mistral-large-latest and inclusion of the Magistral model examples are correct.

unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/ollama.json (1)

28-64: LGTM!

The new properties (max_tokens, temperature, json_mode) and the context_window title fix are well-structured, properly validated, and align with LiteLLM's Ollama parameters. The json_mode boolean provides a user-friendly abstraction that the OllamaLLMParameters.validate() method correctly maps to response_format.

.claude/skills/unstract-adapter-extension/scripts/manage_models.py (1)

158-164: Potential IndexError when enum list is empty.

If model_prop["enum"] is an empty list [], accessing model_prop["enum"][0] on line 161 will raise an IndexError. The ternary condition if model_prop["enum"] handles this, but the logic order should be verified.

Actually, reviewing again - the ternary handles this correctly. The issue is if "default" doesn't exist AND enum is empty, then default becomes "" which is safe.

.claude/skills/unstract-adapter-extension/scripts/check_adapter_updates.py (3)

373-378: list_adapters correctly filters Python files.

The logic uses py_file.name.startswith("__") which checks the filename string, not the Path object directly. This is correct behavior.

49-51: The LITELLM_FEATURES dict requires regular maintenance to reflect current model availability. While "o4-mini" and "gpt-5" are established, released models (not speculative), the hardcoded list should be reviewed for outdated or superseded entries like legacy model identifiers to prevent false positives in adapter update suggestions.

65-66: These model names follow Anthropic's standard versioning format. Anthropic recommends using specific model versions like claude-sonnet-4-5-20250929 in production applications, where the final component is the snapshot date (YYYYMMDD). Claude Sonnet 4.5 is available via the API using the model string claude-sonnet-4-5-20250929, released on September 29, 2025. Similarly, Claude Opus 4.1 was released on August 5, 2025, matching the snapshot date. The format is consistent with earlier models like claude-3-5-sonnet-20241022.

Likely an incorrect or invalid review comment.

unstract/sdk1/src/unstract/sdk1/adapters/base1.py (4)

666-679: LGTM - clean json_mode to response_format conversion.

The json_mode boolean is correctly converted to {"type": "json_object"} format expected by LiteLLM. Using pop on the copied metadata ensures the original isn't mutated and json_mode doesn't leak into the final output.

608-650: LGTM - consistent reasoning pattern with OpenAI/Azure.

The Mistral reasoning implementation follows the same pattern established for OpenAI and Azure adapters, maintaining codebase consistency.

444-448: Good additions for AWS SSO and cost tracking.

aws_profile_name enables SSO authentication workflows, and model_id supports Application Inference Profile for cost allocation. Both have safe None defaults.

691-716: LGTM - azure_ai/ prefix is correct for LiteLLM.

The AzureAIFoundryLLMParameters class correctly uses the azure_ai/ prefix format that LiteLLM expects for Azure AI Foundry models. The validate_model method properly handles both prefixed and unprefixed model names, adding the prefix only when necessary.

.claude/skills/unstract-adapter-extension/scripts/init_llm_adapter.py (2)

286-302: Subprocess command with partial path relies on magick being in PATH.

The script uses magick without an absolute path, which is fine for developer tooling but could fail silently in some environments. The error handling on line 307-311 addresses this adequately.

148-166: Consistent with init_embedding_adapter.py - good for maintainability.

The to_class_name function matches the implementation in the embedding adapter script, ensuring consistent class naming across both adapter types.

coderabbitai · 2025-12-23T05:48:06Z

.claude/skills/adapter-ops/assets/templates/embedding_adapter.py.template

+    @staticmethod
+    def get_icon() -> str:
+        """Return icon path."""
+        return "/icons/adapter-icons/${PROVIDER_NAME}.png"


⚠️ Potential issue | 🟡 Minor

Icon path may contain spaces causing loading issues.

The get_icon() method uses ${PROVIDER_NAME} which is the display name (e.g., "Azure AI Foundry"). However, icon filenames typically use PascalCase without spaces (e.g., AzureAIFoundry.png). Using the display name directly could result in paths with spaces that may fail to load.

Suggested fix

Consider using ${CLASS_NAME} instead, which is already in PascalCase:

@staticmethod def get_icon() -> str: """Return icon path.""" - return "/icons/adapter-icons/${PROVIDER_NAME}.png" + return "/icons/adapter-icons/${CLASS_NAME}.png"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@staticmethod

def get_icon() -> str:

"""Return icon path."""

return "/icons/adapter-icons/${PROVIDER_NAME}.png"

@staticmethod

def get_icon() -> str:

"""Return icon path."""

return "/icons/adapter-icons/${CLASS_NAME}.png"

🤖 Prompt for AI Agents

In .claude/skills/unstract-adapter-extension/assets/templates/embedding_adapter.py.template around lines 51 to 54, the get_icon() return uses ${PROVIDER_NAME} which may contain spaces (display name) and break icon loading; update the template to use ${CLASS_NAME} (PascalCase filename) for the icon path or otherwise sanitize/remove spaces from the provider name so the returned path is a valid filename (e.g., "/icons/adapter-icons/${CLASS_NAME}.png").

coderabbitai · 2025-12-23T05:48:06Z

.claude/skills/adapter-ops/assets/templates/llm_adapter.py.template

+    @staticmethod
+    def get_icon() -> str:
+        """Return icon path."""
+        return "/icons/adapter-icons/${PROVIDER_NAME}.png"


⚠️ Potential issue | 🟡 Minor

Icon path may have spaces causing mismatch with actual icon files.

The template uses ${PROVIDER_NAME} directly in the icon path, but the init scripts use to_icon_name() which removes spaces from the display name. If PROVIDER_NAME is "New Provider", the template outputs /icons/adapter-icons/New Provider.png, while the actual icon is saved as NewProvider.png.

Consider using a separate placeholder like ${ICON_NAME} that corresponds to the space-removed version, or update the template to match the init script's naming logic.

🤖 Prompt for AI Agents

In .claude/skills/unstract-adapter-extension/assets/templates/llm_adapter.py.template around lines 51 to 54, the icon path uses ${PROVIDER_NAME} which can include spaces and mismatch the init script's to_icon_name() (which removes spaces); change the template to use a space-free placeholder (e.g., ${ICON_NAME}) or apply the same transformation as the init script (remove spaces from PROVIDER_NAME) so the generated path matches the actual saved icon filename (e.g., /icons/adapter-icons/${ICON_NAME}.png).

coderabbitai · 2025-12-23T05:48:06Z

.claude/skills/adapter-ops/scripts/init_llm_adapter.py

+def fetch_url(url: str, timeout: int = 10) -> bytes | None:
+    """Fetch content from URL with error handling."""
+    try:
+        headers = {
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
+        }
+        request = Request(url, headers=headers)
+        with urlopen(request, timeout=timeout) as response:
+            return response.read()
+    except (URLError, HTTPError, TimeoutError):
+        return None


⚠️ Potential issue | 🟡 Minor

Consider validating URL scheme to prevent file:// access.

The static analyzer flags urlopen for accepting arbitrary URL schemes. If a user passes --logo-url "file:///etc/passwd", it could read local files. Consider restricting to http:// and https:// schemes.

🔎 Proposed fix

def fetch_url(url: str, timeout: int = 10) -> bytes | None: """Fetch content from URL with error handling.""" + from urllib.parse import urlparse + + parsed = urlparse(url) + if parsed.scheme not in ("http", "https"): + print(f" Error: Only http/https URLs are supported, got: {parsed.scheme}") + return None + try: headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def fetch_url(url: str, timeout: int = 10) -> bytes | None:

"""Fetch content from URL with error handling."""

try:

headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"

}

request = Request(url, headers=headers)

with urlopen(request, timeout=timeout) as response:

return response.read()

except (URLError, HTTPError, TimeoutError):

return None

def fetch_url(url: str, timeout: int = 10) -> bytes | None:

"""Fetch content from URL with error handling."""

from urllib.parse import urlparse

parsed = urlparse(url)

if parsed.scheme not in ("http", "https"):

print(f" Error: Only http/https URLs are supported, got: {parsed.scheme}")

return None

try:

headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"

}

request = Request(url, headers=headers)

with urlopen(request, timeout=timeout) as response:

return response.read()

except (URLError, HTTPError, TimeoutError):

return None

🧰 Tools

🪛 Ruff (0.14.10)

183-183: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

184-184: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

🤖 Prompt for AI Agents

.claude/skills/unstract-adapter-extension/scripts/init_llm_adapter.py around lines 177 to 187: the fetch_url function currently passes any URL to urlopen which allows unsafe schemes like file://; parse the URL (e.g., with urllib.parse.urlparse) and validate that the scheme is either "http" or "https" before creating the Request and calling urlopen, returning None or raising a controlled error for disallowed schemes; perform this check immediately after receiving the url argument and only proceed to build the Request and call urlopen when the scheme is allowed.

* remove claude artifacts from gitignore

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

.claude/skills/adapter-ops/assets/templates/llm_adapter.py.template (1)

51-54: Icon path placeholder mismatch persists.

The past review comment about icon path spacing is still valid. The template uses ${PROVIDER_NAME} in the icon path (line 54), which may contain spaces (e.g., "New Provider"), while the init scripts save icons with spaces removed (e.g., "NewProvider.png"). This creates a mismatch between the generated adapter code and the actual icon filename.

Consider updating the init scripts to use a separate ${ICON_NAME} placeholder that matches the to_icon_name() transformation, or update the template to remove spaces from the provider name in the icon path.

.claude/skills/adapter-ops/assets/templates/embedding_adapter.py.template (1)

51-54: Icon path uses display name which may contain spaces.

This issue was previously identified. The ${PROVIDER_NAME} placeholder (e.g., "Azure AI Foundry") will produce paths with spaces that may fail to load. Consider using ${CLASS_NAME} instead.

.claude/skills/adapter-ops/scripts/init_llm_adapter.py (1)

177-187: URL scheme validation still missing.

This issue was previously flagged. The urlopen call accepts arbitrary URL schemes including file://, which could read local files if a malicious URL is provided via --logo-url.

🧹 Nitpick comments (3)

.claude/skills/adapter-ops/assets/templates/llm_schema.json.template (1)
29-29: Consider using "uri" format for consistency.

Line 29 uses "format": "url" while the embedding template uses "format": "uri". The JSON Schema specification recognizes "uri" as a standard format validator. Consider updating to "uri" for consistency across templates, though "url" may also be supported by some validators.
🔎 Proposed fix
     "api_base": {
       "type": "string",
-      "format": "url",
+      "format": "uri",
       "title": "API Base",
       "default": "",
       "description": "API endpoint URL (optional, uses default if not specified)."
     },
.claude/skills/adapter-ops/scripts/init_llm_adapter.py (2)
148-420: Consider extracting shared utilities to reduce duplication.

The utility functions (to_class_name, to_icon_name, fetch_url, search_potential_logo_sources, download_and_process_logo, copy_logo) are duplicated verbatim in init_embedding_adapter.py. Extracting them to a shared module (e.g., adapter_utils.py) would improve maintainability.
Example structure
# .claude/skills/adapter-ops/scripts/adapter_utils.py
def to_class_name(provider: str) -> str: ...
def to_icon_name(display_name: str) -> str: ...
def fetch_url(url: str, timeout: int = 10) -> bytes | None: ...
# ... other shared functions
Then import in both scripts:
from adapter_utils import to_class_name, to_icon_name, fetch_url, ...
24-32: Path resolution relies on fixed directory structure.

The REPO_ROOT calculation assumes this script is always at .claude/skills/adapter-ops/scripts/. If the skill directory is renamed or moved, paths will break silently.

Consider adding a validation check:
if not (REPO_ROOT / "unstract").exists():
    print(f"Warning: Repository root detection may be incorrect: {REPO_ROOT}")

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 182464b and 98888be.

📒 Files selected for processing (15)

.claude/skills/adapter-ops/SKILL.md
.claude/skills/adapter-ops/assets/templates/embedding_adapter.py.template
.claude/skills/adapter-ops/assets/templates/embedding_parameters.py.template
.claude/skills/adapter-ops/assets/templates/embedding_schema.json.template
.claude/skills/adapter-ops/assets/templates/llm_adapter.py.template
.claude/skills/adapter-ops/assets/templates/llm_parameters.py.template
.claude/skills/adapter-ops/assets/templates/llm_schema.json.template
.claude/skills/adapter-ops/references/adapter_patterns.md
.claude/skills/adapter-ops/references/json_schema_guide.md
.claude/skills/adapter-ops/references/provider_capabilities.md
.claude/skills/adapter-ops/scripts/check_adapter_updates.py
.claude/skills/adapter-ops/scripts/init_embedding_adapter.py
.claude/skills/adapter-ops/scripts/init_llm_adapter.py
.claude/skills/adapter-ops/scripts/manage_models.py
.gitignore

💤 Files with no reviewable changes (1)

.gitignore

✅ Files skipped from review due to trivial changes (2)

.claude/skills/adapter-ops/references/json_schema_guide.md
.claude/skills/adapter-ops/references/adapter_patterns.md

🧰 Additional context used

🧬 Code graph analysis (1)

.claude/skills/adapter-ops/scripts/manage_models.py (2)

.claude/skills/adapter-ops/scripts/init_embedding_adapter.py (1)

main (532-639)

.claude/skills/adapter-ops/scripts/init_llm_adapter.py (1)

main (542-649)

🪛 LanguageTool

.claude/skills/adapter-ops/SKILL.md

[grammar] ~6-~6: Ensure spelling is correct
Context: ...and self-hosted models (Ollama). --- # Unstract Adapter Extension Skill This skill pro...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)

.claude/skills/adapter-ops/references/provider_capabilities.md

45-45: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

57-57: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

63-63: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

70-70: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

76-76: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

82-82: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

88-88: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

138-138: Bare URL used

(MD034, no-bare-urls)

139-139: Bare URL used

(MD034, no-bare-urls)

140-140: Bare URL used

(MD034, no-bare-urls)

141-141: Bare URL used

(MD034, no-bare-urls)

142-142: Bare URL used

(MD034, no-bare-urls)

143-143: Bare URL used

(MD034, no-bare-urls)

144-144: Bare URL used

(MD034, no-bare-urls)

145-145: Bare URL used

(MD034, no-bare-urls)

.claude/skills/adapter-ops/SKILL.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.10)

.claude/skills/adapter-ops/scripts/init_embedding_adapter.py

177-177: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

178-178: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

209-209: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

210-210: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

229-229: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

230-230: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

280-280: subprocess call: check for execution of untrusted input

(S603)

281-293: Starting a process with a partial executable path

(S607)

300-300: Consider moving this statement to an else block

(TRY300)

331-331: Consider moving this statement to an else block

(TRY300)

339-339: Do not catch blind exception: Exception

(BLE001)

360-360: subprocess call: check for execution of untrusted input

(S603)

361-373: Starting a process with a partial executable path

(S607)

380-380: Consider moving this statement to an else block

(TRY300)

406-406: Consider moving this statement to an else block

(TRY300)

413-413: Do not catch blind exception: Exception

(BLE001)

.claude/skills/adapter-ops/scripts/init_llm_adapter.py

183-183: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

184-184: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

215-215: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

216-216: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

235-235: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

236-236: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

286-286: subprocess call: check for execution of untrusted input

(S603)

287-299: Starting a process with a partial executable path

(S607)

306-306: Consider moving this statement to an else block

(TRY300)

337-337: Consider moving this statement to an else block

(TRY300)

345-345: Do not catch blind exception: Exception

(BLE001)

366-366: subprocess call: check for execution of untrusted input

(S603)

367-379: Starting a process with a partial executable path

(S607)

386-386: Consider moving this statement to an else block

(TRY300)

412-412: Consider moving this statement to an else block

(TRY300)

419-419: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (27)

.claude/skills/adapter-ops/SKILL.md (1)

1-350: Excellent comprehensive documentation!

This skill documentation is well-structured and provides clear, actionable guidance for adapter management. The workflows are detailed with concrete examples, and the inclusion of troubleshooting, validation checklists, and maintenance procedures makes this a valuable reference.

.claude/skills/adapter-ops/references/provider_capabilities.md (1)

1-145: Well-organized provider capability reference.

This reference document provides a clear, tabulated overview of LiteLLM provider capabilities. The feature matrices, parameter listings, and authentication methods are comprehensive and align well with the adapter scaffolding system introduced in this PR.

.claude/skills/adapter-ops/scripts/manage_models.py (9)

27-37: Path resolution looks correct.

The path navigation and schema path construction are appropriate for the repository structure.

40-50: Clean JSON I/O functions.

The load and save functions are simple and correct. The trailing newline in save_schema is good for git-friendly diffs.

53-61: LGTM: Clear model information extraction.

The function safely extracts model configuration with appropriate defaults.

64-89: Robust enum addition logic.

The function correctly handles missing structures, avoids duplicates, and sets a sensible default.

92-112: Well-handled model removal with cleanup.

The function correctly handles default updates and removes the enum property when empty.

115-134: Simple and correct field updates.

Both functions properly initialize the schema structure before updating fields.

137-151: convert_to_enum provides guidance rather than automatic conversion.

This function intentionally directs users to use the --action add-enum command rather than automatically converting to an enum. This is reasonable since the script can't know which models should be in the enum list. The function name could be clearer about this behavior (e.g., guide_to_enum), but the current implementation is acceptable.

154-166: Correct enum-to-freetext conversion.

The function properly preserves the default value (using the first enum value as fallback if no explicit default exists) and removes the enum constraint.

169-285: Well-structured CLI with proper action dispatch.

The main function provides a clean CLI interface with appropriate error checking, action dispatch, and dry-run support. The diff-based update approach ensures changes are only written when necessary.

.claude/skills/adapter-ops/scripts/check_adapter_updates.py (6)

26-239: Comprehensive LiteLLM features catalog.

The LITELLM_FEATURES dictionary provides a well-structured reference for known parameters and capabilities across providers. As noted in the maintenance workflow documentation, this catalog should be updated periodically when LiteLLM adds new features.

242-253: Flexible schema loading with pattern matching.

The function correctly handles multiple filename patterns (with and without underscores) to accommodate different provider naming conventions.

256-269: Correctly extracts properties from JSON Schema.

The function handles both standard properties and conditional properties defined in allOf blocks, which is appropriate for schemas with feature-gated fields.

272-364: Thorough adapter analysis with alias handling.

The analyze_adapter function provides comprehensive checking for missing parameters, reasoning/thinking support, and outdated models. The alias filtering (lines 310-323) is particularly valuable for preventing false positives when parameters have alternative names (e.g., api_base vs base_url).

367-423: Clean adapter discovery and reporting.

The list_adapters function correctly identifies Python files, and print_report provides a well-formatted, color-coded summary with actionable information.

426-469: Effective CLI with CI/CD-friendly exit codes.

The main function provides flexible filtering options (adapter type, specific provider) and returns exit code 1 when updates are needed, which is useful for automated checks.

.claude/skills/adapter-ops/assets/templates/embedding_schema.json.template (1)

1-52: Well-structured embedding schema template.

The template provides appropriate defaults and validation constraints. Good use of format: "password" for the API key and format: "uri" for the API base URL to guide UI rendering.

.claude/skills/adapter-ops/assets/templates/llm_schema.json.template (1)

1-58: Solid LLM schema template with appropriate defaults.

The template provides sensible defaults (15-minute timeout, 5 max retries) and proper validation constraints for LLM configuration parameters.

.claude/skills/adapter-ops/assets/templates/llm_adapter.py.template (1)

1-59: Clean adapter template structure.

The template provides a well-organized adapter class skeleton with all required static methods. The placeholder system is clear and documented.

.claude/skills/adapter-ops/assets/templates/embedding_parameters.py.template (1)

1-57: Well-structured embedding parameter template.

The template provides a clean parameter class with proper validation hooks. The comments about field mapping (lines 40-42) and the note that embedding models typically don't need prefixes (line 49) are helpful guidance for developers using this template.

.claude/skills/adapter-ops/assets/templates/llm_parameters.py.template (1)

1-61: LGTM! Template follows established patterns.

The template correctly implements:

Idempotent model prefixing in validate_model

Pydantic validation flow via model_dump()

Clear placeholder documentation

.claude/skills/adapter-ops/assets/templates/embedding_adapter.py.template (1)

1-50: Template structure looks good.

The adapter template correctly implements the required static methods for adapter registration and metadata. The inheritance pattern and method signatures align with existing adapters.

Also applies to: 55-59

.claude/skills/adapter-ops/scripts/init_llm_adapter.py (2)

460-467: Verify that AnyscaleLLMParameters is the intended default.

When neither --add-param-class nor --use-param-class is specified, the adapter defaults to AnyscaleLLMParameters. Confirm this is intentional, or consider using a more generic base class like BaseChatCompletionParameters.

34-145: Implementation looks solid.

The script provides good developer ergonomics:

Clear dry-run mode

Structured result with errors/warnings separation

Helpful next steps guidance

Safe handling of existing files

Also applies to: 423-539, 542-649

.claude/skills/adapter-ops/scripts/init_embedding_adapter.py (2)

454-459: Default parameter class is appropriate.

Using OpenAIEmbeddingParameters as the default is reasonable since OpenAI is a common embedding provider and likely has the most generic parameter set.

34-124: Embedding adapter scaffolding implementation looks correct.

The script properly handles:

Embedding-specific schema fields (embed_batch_size, api_base)

Correct target directories (embedding1/ vs llm1/)

Appropriate default parameter class

Also applies to: 417-529, 532-643

coderabbitai · 2025-12-23T05:56:46Z

.claude/skills/adapter-ops/scripts/init_embedding_adapter.py

+    @staticmethod
+    def validate_model(adapter_metadata: dict[str, "Any"]) -> str:
+        model = adapter_metadata.get("model", "")
+        return model
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, find the file find . -name "init_embedding_adapter.py" -type f 2>/dev/null | head -20

Repository: Zipstack/unstract

Length of output: 123

🏁 Script executed:

cat -n ./.claude/skills/adapter-ops/scripts/init_embedding_adapter.py | head -200

Repository: Zipstack/unstract

Length of output: 7865

🏁 Script executed:

grep -n "LLM_ADAPTER_TEMPLATE\|validate_model" ./.claude/skills/adapter-ops/scripts/init_embedding_adapter.py | head -30

Repository: Zipstack/unstract

Length of output: 218

🏁 Script executed:

find . -name "*init*llm*adapter*" -o -name "*init*completion*" 2>/dev/null | grep -i script

Repository: Zipstack/unstract

Length of output: 117

🏁 Script executed:

grep -A 10 "def validate_model" ./.claude/skills/adapter-ops/scripts/init_llm_adapter.py

Repository: Zipstack/unstract

Length of output: 385

🏁 Script executed:

cat -n ./.claude/skills/adapter-ops/scripts/init_llm_adapter.py | sed -n '126,145p'

Repository: Zipstack/unstract

Length of output: 914

validate_model should add provider prefix to model name.

The embedding adapter returns the model unchanged, but LiteLLM embedding models require provider prefixes (e.g., openai/text-embedding-ada-002) just like completion models. Update validate_model to conditionally prepend the provider prefix, matching the LLM adapter's implementation.

🤖 Prompt for AI Agents

In .claude/skills/adapter-ops/scripts/init_embedding_adapter.py around lines 139-143, validate_model currently returns the raw model string; update it to fetch the provider from adapter_metadata (e.g., adapter_metadata.get("provider", "")) and, if a provider is present and the model does not already include a provider prefix (no "/" in model), prepend the provider followed by "/" to the model before returning it; otherwise return the model unchanged — mirror the conditional prefixing logic used by the LLM adapter so embedding models become e.g. "openai/text-embedding-ada-002".

coderabbitai · 2025-12-23T05:56:46Z

.claude/skills/adapter-ops/scripts/init_embedding_adapter.py

+def fetch_url(url: str, timeout: int = 10) -> bytes | None:
+    """Fetch content from URL with error handling."""
+    try:
+        headers = {
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
+        }
+        request = Request(url, headers=headers)
+        with urlopen(request, timeout=timeout) as response:
+            return response.read()
+    except (URLError, HTTPError, TimeoutError):
+        return None


⚠️ Potential issue | 🟡 Minor

URL scheme validation missing (same as LLM script).

The fetch_url function and search_potential_logo_sources use urlopen without validating URL schemes. Consider restricting to http:// and https:// to prevent file:// access.

🔎 Proposed fix

def fetch_url(url: str, timeout: int = 10) -> bytes | None: """Fetch content from URL with error handling.""" + from urllib.parse import urlparse + + parsed = urlparse(url) + if parsed.scheme not in ("http", "https"): + print(f" Error: Only http/https URLs are supported, got: {parsed.scheme}") + return None + try: headers = {

🧰 Tools

🪛 Ruff (0.14.10)

177-177: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

178-178: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

🤖 Prompt for AI Agents

In .claude/skills/adapter-ops/scripts/init_embedding_adapter.py around lines 171 to 181, fetch_url currently calls urlopen without validating the URL scheme which allows unsafe schemes like file://; update the function to parse and validate the URL scheme (allow only 'http' and 'https') before creating the Request and calling urlopen, returning None immediately for invalid or missing schemes; use urllib.parse.urlparse to check scheme, and keep existing error handling otherwise.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

.claude/skills/adapter-ops/scripts/check_adapter_updates.py (2)

242-254: Add error handling for JSON parsing.

If a schema file exists but contains malformed JSON, json.load(f) will raise a JSONDecodeError, causing the script to crash. Consider catching and handling this error to provide a clearer error message.

🔎 Proposed fix

 def load_json_schema(adapter_type: str, provider: str) -> dict | None:
     """Load JSON schema for an adapter."""
     schema_dir = SDK1_ADAPTERS / f"{adapter_type}1" / "static"
 
     # Try common filename patterns
     for filename in [f"{provider}.json", f"{provider.replace('_', '')}.json"]:
         schema_path = schema_dir / filename
         if schema_path.exists():
-            with open(schema_path) as f:
-                return json.load(f)
+            try:
+                with open(schema_path) as f:
+                    return json.load(f)
+            except json.JSONDecodeError as e:
+                print(f"Warning: Failed to parse {schema_path}: {e}", file=sys.stderr)
+                return None
 
     return None

305-307: Add clarifying comment for "adapter_name" exclusion.

The exclusion of "adapter_name" from the missing parameters check is not explained. A brief comment would improve code clarity.

🔎 Proposed fix

     # Find missing parameters
     known_params = set(features.get("known_params", []))
+    # adapter_name is always present in schemas but not a LiteLLM param
     missing = (
         known_params - current_props - {"adapter_name"}
-    )  # adapter_name is always present
+    )

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 98888be and 6ff7b05.

📒 Files selected for processing (1)

.claude/skills/adapter-ops/scripts/check_adapter_updates.py

🧰 Additional context used

🧬 Code graph analysis (1)

.claude/skills/adapter-ops/scripts/check_adapter_updates.py (2)

.claude/skills/adapter-ops/scripts/init_embedding_adapter.py (1)

main (532-639)

.claude/skills/adapter-ops/scripts/init_llm_adapter.py (1)

main (542-649)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (6)

.claude/skills/adapter-ops/scripts/check_adapter_updates.py (6)

1-24: LGTM!

The imports and path resolution are clean and consistent with the related adapter initialization scripts.

256-270: LGTM!

The function correctly extracts properties from both direct properties and conditional allOf/then/properties sections of JSON schemas.

310-324: LGTM!

The alias filtering logic correctly handles common parameter name variations across providers, preventing false positives in the missing parameters report.

367-380: LGTM!

The function correctly discovers adapters by scanning the filesystem and filtering out Python internal files.

382-424: LGTM!

The report formatting is clear and user-friendly, with good visual organization using emoji indicators and grouped output.

426-469: LGTM!

The CLI interface is well-designed with appropriate options for filtering and output formatting. The exit code behavior (returning 1 if updates are needed) enables integration with CI/CD pipelines.

coderabbitai · 2025-12-23T06:02:01Z

.claude/skills/adapter-ops/scripts/check_adapter_updates.py

+        "vertex_ai": {
+            "known_params": [
+                "json_credentials",
+                "vertex_credentials",
+                "project",
+                "vertex_project",
+                "model",
+                "max_tokens",
+                "max_retries",
+                "timeout",
+                "temperature",
+                "safety_settings",
+                "enable_thinking",
+                "budget_tokens",
+                "thinking",
+                "reasoning_effort",
+                "tools",
+                "googleSearch",
+            ],
+            "thinking_models": ["gemini-2.5-flash-preview", "gemini-2.5-pro"],
+            "docs_url": "https://docs.litellm.ai/docs/providers/vertex",
+        },


⚠️ Potential issue | 🟡 Minor

Fix naming inconsistency for Vertex AI provider.

The provider is named "vertex_ai" in the LLM section (line 108) but "vertexai" in the embedding section (line 214). This inconsistency could cause lookup failures in analyze_adapter when the function retrieves features using LITELLM_FEATURES.get(adapter_type, {}).get(provider, {}) (line 296).

🔎 Proposed fix

Standardize on one naming convention. Based on the typical provider naming pattern, use "vertex_ai":

- "vertexai": { + "vertex_ai": { "known_params": [

Also applies to: 214-227

🤖 Prompt for AI Agents

.claude/skills/adapter-ops/scripts/check_adapter_updates.py around lines 108-129 and 214-227: there is a naming mismatch where the LLM section uses "vertex_ai" but the embedding section uses "vertexai", causing lookup failures in analyze_adapter; change the embedding section provider name from "vertexai" to "vertex_ai" (and ensure any other occurrences in this file use the same "vertex_ai" key) so LITELLM_FEATURES lookups are consistent.

chandrasekharan-zipstack

LGTM - check the code rabbit comments in case they're valid

chandrasekharan-zipstack · 2025-12-24T09:43:45Z

.gitignore

 .prompting/

 # Claude
-CLAUDE.md


@hari-kuriakose we should commit a common CLAUDE.md that all of us can use. This might cause our local versions to be tracked in git

gaya3-zipstack

looks good

github-actions · 2025-12-29T05:11:39Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 66 passed, 0 failed (66 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

sonarqubecloud · 2025-12-29T05:11:46Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

hari-kuriakose added 2 commits December 23, 2025 10:59

* feat: add unstract adapter extension skill for claude

6666276

* feat: add azure ai foundry adapter * feat: update parameters in various adapters

* feat: add unstract adapter extension skill for claude

fb2b591

* feat: add azure ai foundry adapter * feat: update parameters in various adapters

[pre-commit.ci] auto fixes from pre-commit.com hooks

182464b

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Dec 23, 2025

View reviewed changes

* feat: rename adapter-ops skill

98888be

* remove claude artifacts from gitignore

coderabbitai bot reviewed Dec 23, 2025

View reviewed changes

chore: marked skill script as executable

6ff7b05

hari-kuriakose self-assigned this Dec 23, 2025

hari-kuriakose requested a review from gaya3-zipstack December 23, 2025 06:00

hari-kuriakose assigned pk-zipstack Dec 23, 2025

hari-kuriakose requested a review from chandrasekharan-zipstack December 23, 2025 06:00

hari-kuriakose unassigned pk-zipstack Dec 23, 2025

hari-kuriakose requested a review from pk-zipstack December 23, 2025 06:00

coderabbitai bot reviewed Dec 23, 2025

View reviewed changes

hari-kuriakose requested a review from kirtimanmishrazipstack December 23, 2025 06:13

chandrasekharan-zipstack approved these changes Dec 24, 2025

View reviewed changes

gaya3-zipstack approved these changes Dec 26, 2025

View reviewed changes

Merge branch 'main' into feat/adater-ops-skill

a810adc

feat: Add Unstract Adapter Extension Skill and Azure AI Foundry Adapter #1721

Are you sure you want to change the base?

feat: Add Unstract Adapter Extension Skill and Azure AI Foundry Adapter #1721

Uh oh!

Conversation

hari-kuriakose commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

chandrasekharan-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

chandrasekharan-zipstack Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

gaya3-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 29, 2025

Test Results

Uh oh!

sonarqubecloud bot commented Dec 29, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hari-kuriakose commented Dec 23, 2025 •

edited

Loading

coderabbitai bot commented Dec 23, 2025 •

edited

Loading