Open Source LLMA Framework for Contextual Product Matching in LLM Applications
Revolutionizing accessible advertisements in LLMs. VKRA Protocol provides a modular, open-source framework for contextual product matching that enables developers to build monetization into their LLM applications while maintaining user trust and conversation quality.
VKRA Protocol is a modular LLMA (Large Language Model Advertising) framework that implements the "Polite Ad Network" approach. It provides pluggable modules for:
- Presentation: Creating product cards without modifying LLM responses
- Commission: Extracting affiliate commission rates from product data
- Prediction: Calculating conversion probability using personalization
- Ranking: Balancing revenue (commission × conversion) with user satisfaction
The framework is database-agnostic and infrastructure-independent, making it easy to integrate into any application.
┌─────────────────────────────────────────────────────────┐
│ LLMAOrchestrator │
│ Coordinates the complete matching pipeline │
└─────────────────────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Presentation │ │ Commission │ │ Prediction │
│ Module │ │ Module │ │ Module │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
└───────────────┼───────────────┘
│
▼
┌──────────────┐
│ Ranking │
│ Module │
└──────────────┘
- LLMAOrchestrator: Coordinates all modules in the pipeline
- VectorDatabase Interface: Abstract interface for vector search
- UserProfileStore Interface: Abstract interface for user profile management
- Module Base Classes: Pluggable interfaces for all modules
- 🎯 Contextual Matching: Intent-based product recommendations using embeddings
- 🔄 Modular Design: Pluggable modules for easy customization
- 🛡️ Privacy-First: GDPR-compliant with pseudonymization and differential privacy
- ⚡ High Performance: Optimized for sub-200ms latency
- 🔌 Database Agnostic: Works with any vector database implementation
- 📊 Polite Ad Network: Balances revenue with user trust (SR_query metric)
pip install vkra-protocolOr install from source:
git clone https://github.com/vkra/vkra-protocol.git
cd vkra-protocol
pip install -e .First, implement the abstract interfaces for your database:
from vkra_protocol import VectorDatabase, UserProfileStore
from typing import Any
from uuid import UUID
class MyVectorDatabase(VectorDatabase):
async def search(
self,
query_embedding: list[float],
limit: int = 10,
filters: dict[str, Any] | None = None,
user_profile_embedding: list[float] | None = None,
query_weight: float = 0.7,
profile_weight: float = 0.3,
) -> tuple[list[dict[str, Any]], int]:
# Your vector search implementation
# Returns (products list, total_matches count)
pass
class MyUserProfileStore(UserProfileStore):
async def get_user_profile(
self,
user_id: UUID,
privacy_level: str = "standard",
) -> dict[str, Any] | None:
# Your user profile retrieval
# Returns profile dict or None
pass
async def update_profile_from_context(
self,
user_id: UUID,
active_context: str,
background: bool = True,
) -> None:
# Your profile update logic
passDefault: Local Ollama (Zero-Friction Setup)
# Install Ollama (if not already installed)
# Visit https://ollama.ai for installation instructions
# Pull the Qwen3 embedding model
ollama pull qwen3-embeddingfrom vkra_protocol import LLMAOrchestrator, EmbeddingService, LLMService
# Initialize services (defaults to local Ollama)
embedding_service = EmbeddingService() # Uses local Ollama by default
llm_service = LLMService() # Uses local Ollama by default
# Or use OpenAI (optional)
# Set environment variables:
# EMBED_PROVIDER=openai
# LLM_PROVIDER=openai
# OPENAI_API_KEY=your-keyfrom vkra_protocol import LLMAOrchestrator, EmbeddingService, LLMService
# Initialize your implementations
vector_db = MyVectorDatabase()
profile_store = MyUserProfileStore()
embedding_service = EmbeddingService() # Default: local Ollama
llm_service = LLMService() # Default: local Ollama
# Create orchestrator
orchestrator = LLMAOrchestrator(
vector_db=vector_db,
profile_store=profile_store,
embedding_service=embedding_service,
llm_service=llm_service,
)from vkra_protocol.schemas import SearchRequest, UserContext
from uuid import uuid4
# Create search request
request = SearchRequest(
query="wireless headphones for running",
context=UserContext(
user_id=uuid4(), # Optional: for personalization
session_id=uuid4(),
privacy_level="standard",
),
limit=3,
)
# Execute search
response, embedding_time, search_time, cache_hit = await orchestrator.execute_search(request)
# Access results
for result in response.results:
print(f"{result['title']} - {result['price']}")
print(f"Relevance: {result['relevance_score']:.2f}")
print(f"Commission: {result.get('commission_rate', 0):.1%}")Default (Local Ollama):
EMBED_PROVIDER=local(default)LOCAL_EMBED_URL=http://localhost:11434/api/embeddings(default)EMBED_MODEL=qwen3-embedding(default)LLM_PROVIDER=local(default)LOCAL_LLM_URL=http://localhost:11434/api/chat(default)LLM_MODEL=qwen3:8b(default)
Optional (OpenAI):
EMBED_PROVIDER=openai(to use OpenAI)OPENAI_API_KEY: Required if using OpenAIOPENAI_EMBED_MODEL=text-embedding-3-small(default)OPENAI_LLM_MODEL=gpt-4o-mini(default)
Customize module behavior per request:
from vkra_protocol.schemas import ModuleConfig, SearchRequest
module_config = ModuleConfig(
presentation_module="product_card",
commission_module="extract",
prediction_module="conversion",
ranking_module="affiliate",
module_params={
"prediction_sr_query_threshold": 0.7, # Polite Ad filter
"prediction_sr_query_weight": 0.3,
"prediction_sr_history_weight": 0.5,
"ranking_sr_query_weight": 0.6,
"ranking_sr_history_weight": 0.4,
},
)
request = SearchRequest(
query="laptop for programming",
module_config=module_config,
limit=5,
)Creates product cards from database results. Does NOT modify LLM response text - preserves agent's voice.
Interface: PresentationModule.create_product_cards()
Implementation: ProductCardPresentationModule
Extracts commission rates from product data. Rates are stored in product metadata (no external lookup).
Interface: CommissionModule.extract_commission_rates()
Implementation: CommissionExtractionModule
Calculates:
- SR_query: Query-based satisfaction rate (Polite Ad metric - measures interruption)
- SR_history: History-based satisfaction rate (personalization)
- Conversion probability: Combined prediction
Interface: PredictionModule.predict_conversion()
Implementation: ConversionPredictionModule
Polite Ad Filter: Products with SR_query < threshold (default 0.7) are filtered out to maintain conversation quality.
Ranks products using:
Score = Commission × Conversion × (SR_query × w1 + SR_history × w2)
This balances revenue with user trust.
Interface: RankingModule.rank_products()
Implementation: AffiliateRankingModule
Handles user profile management with GDPR compliance:
- Profile retrieval and summarization
- Embedding generation
- Privacy safeguards (pseudonymization, differential privacy)
Interface: UserPreferenceModule.get_user_profile(), update_profile_from_context()
Implementation: UserPreferenceMaintenanceModule
VKRA Protocol supports three privacy levels:
- minimal: Query-only, no user profile
- standard: Pseudonymized user IDs, profile embeddings
- full: Differential privacy noise added to embeddings
context = UserContext(
user_id=user_id,
privacy_level="standard", # or "minimal", "full"
include_history=True,
)See the examples/ directory for complete working examples:
- Basic search
- Personalized search with user profiles
- Custom module configuration
- Integration with FastAPI
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
- Interface First: All modules use abstract base classes
- Database Agnostic: No hardcoded database dependencies
- Async Everything: All I/O operations are async
- Privacy by Design: GDPR compliance built-in
- Performance First: Optimized for low latency
- End-to-end latency: < 200ms
- Profile loading: < 50ms (cached)
- Vector search: < 100ms
- Module pipeline: < 50ms
This project is licensed under the MIT License - see the LICENSE file for details.
For a managed service with:
- Pre-configured databases (LanceDB Cloud, Supabase)
- API key management
- Analytics dashboard
- MCP server integration
Visit: https://www.vkra.org
- Documentation: https://docs.vkra.org
- Issues: GitHub Issues
- Email: hello@vkra.org
Made for developers building the future of AI agents 🤖
Get started today: https://www.vkra.org