Version: 1.3 Update Date: December 26, 2025
The platform uses multiple AI providers for:
- LLM explanations and technical problem descriptions with code fix examples.
- Embeddings and finding similarity.
- Speech-to-Text and Text-to-Speech.
All providers implement a common interface in apps/services/ai/src/providers (Strategy pattern). Configuration is set via environment variables and Secrets Manager/Vault.
apps/services/ai/src/
├── providers/
│ ├── base.py # AIProvider interface (Strategy pattern)
│ ├── openai_provider.py # OpenAI GPT-4o/GPT-4o mini provider
│ ├── anthropic_provider.py # Anthropic Claude 3.5 provider
│ ├── ollama_provider.py # Ollama local LLM provider
│ └── __init__.py
├── services/
│ ├── explanation_service.py # AI explanation generation for findings
│ ├── effort_estimation_service.py # Repair complexity estimation
│ ├── impacted_users_service.py # Affected user category analysis
│ ├── fix_suggestions_service.py # Fix suggestion generation (Strategy pattern)
│ └── __init__.py
├── generators/ # Platform-specific fix suggestion generators
│ ├── base.py # BaseFixGenerator abstract class
│ ├── html_aria.py # HTML/ARIA fix generator
│ ├── react_native.py # React Native fix generator
│ ├── flutter.py # Flutter fix generator
│ ├── angular.py # Angular fix generator
│ ├── vue.py # Vue.js fix generator
│ ├── svelte.py # Svelte fix generator
│ ├── generic_framework.py # Generic framework fallback
│ └── __init__.py
├── router.py # AIProviderRouter with fallback mechanism
├── langchain_router.py # LangChain router for task-based routing
└── main.py # FastAPI application for AI service
libs/ai/src/ # TypeScript library (future)
├── providers/ # TypeScript providers (TODO)
├── prompts/ # YAML prompt templates (TODO)
└── embeddings/ # Embeddings utilities (TODO)
Note: The TypeScript library libs/ai is under development. The current implementation is located in apps/services/ai/src/.
| Provider | Role | Benefits | Limitations |
|---|---|---|---|
| Qwen2.5-Coder-3B-Instruct | Primary provider | Optimized for code, free, local | Requires GPU, less context |
| Phi-4-mini | First fallback provider | Fast, free, local | Less context, requires GPU |
| Llama-3.3-3B-Instruct | Second fallback provider | Fast, free, local | Less context, requires GPU |
Important: All three providers run via the Ollama API and are local (they do not require API keys). After resetting Docker to factory settings or upon first run, you must download all three models into Ollama. See the Ollama Model Initialization section in the setup documentation.
Fallback Order: Qwen (primary) → Phi (first fallback) → Llama (second fallback)
AI_PRIMARY_PROVIDER=qwen(Qwen as primary provider).AI_FALLBACK_PROVIDER=phi(Phi as first fallback provider).- Response caching in Redis (fingerprint prompt + parameters).
- Prompt templates in
libs/ai/src/prompts/*.yaml(explanation, effort_estimation, impacted_users). - LangChain router (
apps/services/ai/src/langchain_router.py) determines the provider by task type (explanation, fix, summary, aas).- Uses
langchain-communityfor Ollama-based providers (Qwen, Phi, Llama). - Initializes Ollama LLM for each provider (qwen, phi, llama).
- Has a fallback to
AIProviderRouterif LangChain is unavailable.
- Uses
- All LLM calls are wrapped in an OpenTelemetry span
llm.requestand logged with redaction.
Starting from version 2.9.8, the platform supports structured output to guarantee valid JSON with a defined structure. This solves the problem of parsing JSON responses from AI models.
Support by Provider:
- Qwen: Uses the
formatparameter with a JSON schema via the Ollama API. - Phi: Uses the
formatparameter with a JSON schema via the Ollama API. - Llama: Uses the
formatparameter with a JSON schema via the Ollama API (supported since Ollama 0.1.0+).
Usage:
from ai.src.router import AIProviderRouter
from ai.src.utils.json_schema import get_fix_suggestions_schema
router = AIProviderRouter()
schema = get_fix_suggestions_schema()
# Structured output generation
result = await router.generate_structured(
prompt="Generate fix suggestion...",
json_schema=schema,
system_prompt="You are an accessibility expert...",
service_name="fix_suggestions",
)
# result is guaranteed to be a valid dict matching the schema
assert "technical_description" in result
assert "code_before" in result
assert "code_after" in resultFallback Mechanism:
If a provider does not support native structured output, the router automatically uses a fallback:
- Calls the regular
generate()method. - Parses the JSON response using
parse_structured_response(). - Validates the result against the schema.
JSON Schemas:
Schemas are defined in apps/services/ai/src/utils/json_schema.py:
FIX_SUGGESTIONS_SCHEMA— schema for fix suggestions.get_fix_suggestions_schema()— function for retrieving the schema.
Benefits:
- 100% success rate for valid JSON responses.
- Guaranteed response structure.
- Eliminates the need for regex parsing.
- Automatic schema validation.
| Provider | Model | Application |
|---|---|---|
| SentenceTransformers (local) | all-MiniLM-L12-v2 or custom |
Fast local embeddings |
OpenAI text-embedding-3-large |
For high-accuracy clustering | Used for premium plans |
- Vector Storage: Milvus (default) or Pinecone.
- Batch upserts, metadata includes
finding_id,rule_id,severity.
Important: TTS is used only to demonstrate to developers what a screen reader user would hear. It is not used for accessibility analysis. Announcement clarity analysis is performed via AI analysis of the Accessibility Tree.
Providers:
- OpenAI Audio
- ElevenLabs
- AWS Polly (fallback)
- Coqui XTTS v2 (planned for development, a free solution)
Configuration:
- Via
TTS_PROVIDERenvironment variable. - Audio is saved in S3 with the
audio/prefix. - Used to generate demonstration audio files for findings.
- Whisper API (OpenAI)
- Vosk (self-hosted fallback)
Used for analyzing user audio notes and feedback.
- Each prompt version has a
prompt_id,version, andowner. - Prompt changes undergo code review and snapshot tests.
- A safety filter (toxicity, PII leaks) is enabled in production.
- Rate limiting via Redis (token bucket).
- Cost monitoring:
ai_usagemetrics (Prometheus + Grafana dashboard). - Rollback plan upon unavailability: switch to fallback provider, feature degradation (no audio/technical problem descriptions).
- Keys do not enter containers directly; a sidecar proxies requests.
- LLM request logs are anonymized before saving.
- Prompt saving is disabled with external providers (data privacy toggles).
- Mocks in unit tests:
libs/ai/testing/mock_providers.py. - Integration tests use sandbox API keys.
- For local development,
make run-ollama(dockerized) is available.
Important: After resetting Docker to factory settings, you must load the Ollama model manually:
# Check current models in configuration
docker exec accellens-services python -c "from config import settings; print(f'Qwen model: {settings.qwen_model}'); print(f'Phi model: {settings.phi_model}'); print(f'Llama model: {settings.llama_model}')"
# Load models (automatically via ollama-init container)
# Or manually:
docker exec accellens-ollama ollama pull qwen2.5-coder:3b-instruct
docker exec accellens-ollama ollama pull phi3:mini
docker exec accellens-ollama ollama pull llama3.2:3b
# Verify models are loaded
docker exec accellens-ollama ollama listAutomation: For automatic model loading on container start, you can configure it in docker-compose.dev.yml or add an init script. See Ollama Model Initialization for details.
Diagnostics: If AI recommendations are not being generated, check the Celery worker logs. A 404 Not Found error for /api/generate indicates the model is missing in Ollama.
- LLM errors → Celery retries (exponential, max 3).
- TTS timeouts → fallback to another voice.
- Technical problem descriptions with low confidence (<0.6) → marked as draft.
- Implement fine-tuning on user feedback (v2).
- Add explainability metrics (faithfulness, helpfulness).
- Research multi-language models for internationalization.