Skip to content

Latest commit

 

History

History
248 lines (172 loc) · 9.72 KB

File metadata and controls

248 lines (172 loc) · 9.72 KB

AI Providers Guide — Accellens

Version: 1.3 Update Date: December 26, 2025


1. Overview

The platform uses multiple AI providers for:

  • LLM explanations and technical problem descriptions with code fix examples.
  • Embeddings and finding similarity.
  • Speech-to-Text and Text-to-Speech.

All providers implement a common interface in apps/services/ai/src/providers (Strategy pattern). Configuration is set via environment variables and Secrets Manager/Vault.

AI Component Structure

apps/services/ai/src/
├── providers/
│   ├── base.py                    # AIProvider interface (Strategy pattern)
│   ├── openai_provider.py         # OpenAI GPT-4o/GPT-4o mini provider
│   ├── anthropic_provider.py      # Anthropic Claude 3.5 provider
│   ├── ollama_provider.py         # Ollama local LLM provider
│   └── __init__.py
├── services/
│   ├── explanation_service.py     # AI explanation generation for findings
│   ├── effort_estimation_service.py # Repair complexity estimation
│   ├── impacted_users_service.py  # Affected user category analysis
│   ├── fix_suggestions_service.py # Fix suggestion generation (Strategy pattern)
│   └── __init__.py
├── generators/                     # Platform-specific fix suggestion generators
│   ├── base.py                     # BaseFixGenerator abstract class
│   ├── html_aria.py                # HTML/ARIA fix generator
│   ├── react_native.py             # React Native fix generator
│   ├── flutter.py                  # Flutter fix generator
│   ├── angular.py                  # Angular fix generator
│   ├── vue.py                      # Vue.js fix generator
│   ├── svelte.py                   # Svelte fix generator
│   ├── generic_framework.py       # Generic framework fallback
│   └── __init__.py
├── router.py                      # AIProviderRouter with fallback mechanism
├── langchain_router.py            # LangChain router for task-based routing
└── main.py                        # FastAPI application for AI service

libs/ai/src/                        # TypeScript library (future)
├── providers/                     # TypeScript providers (TODO)
├── prompts/                       # YAML prompt templates (TODO)
└── embeddings/                    # Embeddings utilities (TODO)

Note: The TypeScript library libs/ai is under development. The current implementation is located in apps/services/ai/src/.


2. Large Language Models (LLM)

Provider Role Benefits Limitations
Qwen2.5-Coder-3B-Instruct Primary provider Optimized for code, free, local Requires GPU, less context
Phi-4-mini First fallback provider Fast, free, local Less context, requires GPU
Llama-3.3-3B-Instruct Second fallback provider Fast, free, local Less context, requires GPU

Important: All three providers run via the Ollama API and are local (they do not require API keys). After resetting Docker to factory settings or upon first run, you must download all three models into Ollama. See the Ollama Model Initialization section in the setup documentation.

Fallback Order: Qwen (primary) → Phi (first fallback) → Llama (second fallback)

Configuration

  • AI_PRIMARY_PROVIDER=qwen (Qwen as primary provider).
  • AI_FALLBACK_PROVIDER=phi (Phi as first fallback provider).
  • Response caching in Redis (fingerprint prompt + parameters).

Usage

  • Prompt templates in libs/ai/src/prompts/*.yaml (explanation, effort_estimation, impacted_users).
  • LangChain router (apps/services/ai/src/langchain_router.py) determines the provider by task type (explanation, fix, summary, aas).
    • Uses langchain-community for Ollama-based providers (Qwen, Phi, Llama).
    • Initializes Ollama LLM for each provider (qwen, phi, llama).
    • Has a fallback to AIProviderRouter if LangChain is unavailable.
  • All LLM calls are wrapped in an OpenTelemetry span llm.request and logged with redaction.

Structured Output (JSON Schema)

Starting from version 2.9.8, the platform supports structured output to guarantee valid JSON with a defined structure. This solves the problem of parsing JSON responses from AI models.

Support by Provider:

  • Qwen: Uses the format parameter with a JSON schema via the Ollama API.
  • Phi: Uses the format parameter with a JSON schema via the Ollama API.
  • Llama: Uses the format parameter with a JSON schema via the Ollama API (supported since Ollama 0.1.0+).

Usage:

from ai.src.router import AIProviderRouter
from ai.src.utils.json_schema import get_fix_suggestions_schema

router = AIProviderRouter()
schema = get_fix_suggestions_schema()

# Structured output generation
result = await router.generate_structured(
    prompt="Generate fix suggestion...",
    json_schema=schema,
    system_prompt="You are an accessibility expert...",
    service_name="fix_suggestions",
)

# result is guaranteed to be a valid dict matching the schema
assert "technical_description" in result
assert "code_before" in result
assert "code_after" in result

Fallback Mechanism:

If a provider does not support native structured output, the router automatically uses a fallback:

  1. Calls the regular generate() method.
  2. Parses the JSON response using parse_structured_response().
  3. Validates the result against the schema.

JSON Schemas:

Schemas are defined in apps/services/ai/src/utils/json_schema.py:

  • FIX_SUGGESTIONS_SCHEMA — schema for fix suggestions.
  • get_fix_suggestions_schema() — function for retrieving the schema.

Benefits:

  • 100% success rate for valid JSON responses.
  • Guaranteed response structure.
  • Eliminates the need for regex parsing.
  • Automatic schema validation.

3. Embeddings

Provider Model Application
SentenceTransformers (local) all-MiniLM-L12-v2 or custom Fast local embeddings
OpenAI text-embedding-3-large For high-accuracy clustering Used for premium plans
  • Vector Storage: Milvus (default) or Pinecone.
  • Batch upserts, metadata includes finding_id, rule_id, severity.

4. Speech Services

Text-to-Speech (TTS)

Important: TTS is used only to demonstrate to developers what a screen reader user would hear. It is not used for accessibility analysis. Announcement clarity analysis is performed via AI analysis of the Accessibility Tree.

Providers:

  • OpenAI Audio
  • ElevenLabs
  • AWS Polly (fallback)
  • Coqui XTTS v2 (planned for development, a free solution)

Configuration:

  • Via TTS_PROVIDER environment variable.
  • Audio is saved in S3 with the audio/ prefix.
  • Used to generate demonstration audio files for findings.

Speech-to-Text (STT)

  • Whisper API (OpenAI)
  • Vosk (self-hosted fallback)

Used for analyzing user audio notes and feedback.


5. Prompt Governance

  • Each prompt version has a prompt_id, version, and owner.
  • Prompt changes undergo code review and snapshot tests.
  • A safety filter (toxicity, PII leaks) is enabled in production.

6. Limits and Budgeting

  • Rate limiting via Redis (token bucket).
  • Cost monitoring: ai_usage metrics (Prometheus + Grafana dashboard).
  • Rollback plan upon unavailability: switch to fallback provider, feature degradation (no audio/technical problem descriptions).

7. Security

  • Keys do not enter containers directly; a sidecar proxies requests.
  • LLM request logs are anonymized before saving.
  • Prompt saving is disabled with external providers (data privacy toggles).

8. Development and Testing

  • Mocks in unit tests: libs/ai/testing/mock_providers.py.
  • Integration tests use sandbox API keys.
  • For local development, make run-ollama (dockerized) is available.

8.1 Initializing Ollama after Docker Reset

Important: After resetting Docker to factory settings, you must load the Ollama model manually:

# Check current models in configuration
docker exec accellens-services python -c "from config import settings; print(f'Qwen model: {settings.qwen_model}'); print(f'Phi model: {settings.phi_model}'); print(f'Llama model: {settings.llama_model}')"

# Load models (automatically via ollama-init container)
# Or manually:
docker exec accellens-ollama ollama pull qwen2.5-coder:3b-instruct
docker exec accellens-ollama ollama pull phi3:mini
docker exec accellens-ollama ollama pull llama3.2:3b

# Verify models are loaded
docker exec accellens-ollama ollama list

Automation: For automatic model loading on container start, you can configure it in docker-compose.dev.yml or add an init script. See Ollama Model Initialization for details.

Diagnostics: If AI recommendations are not being generated, check the Celery worker logs. A 404 Not Found error for /api/generate indicates the model is missing in Ollama.


9. Troubleshooting

  • LLM errors → Celery retries (exponential, max 3).
  • TTS timeouts → fallback to another voice.
  • Technical problem descriptions with low confidence (<0.6) → marked as draft.

10. AI Roadmap

  • Implement fine-tuning on user feedback (v2).
  • Add explainability metrics (faithfulness, helpfulness).
  • Research multi-language models for internationalization.