A research middleware for LLM trustworthiness that intercepts harmful image generation prompts and transforms them into safe, cartoon-themed alternatives using Positive Prompt Injection.
┌─────────────────────────────────────────────────────────────────────┐
│ KazokuShield Workflow │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ User Prompt │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ MODERATOR │ Detection: OpenAI GPT-4o │
│ │ (Detection) │ Classifies prompt as "Harmful" or "Safe" │
│ └──────┬──────┘ │
│ │ │
│ ┌────┴────┐ │
│ ▼ ▼ │
│ Harmful Safe │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────┐ │
│ │ SHIELD │ Injection: OpenAI GPT-4o │
│ │(Prompt Inj) │ Rewrites harmful → safe cartoon theme │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ GENERATOR │ Image Gen: Venice.AI (qwen-image-2-pro) │
│ │(Image Gen) │ Receives injected (safe) prompt │
│ └─────────────┘ │
│ │ │
│ ▼ │
│ Safe Image │
│ │
└─────────────────────────────────────────────────────────────────────┘
- Direct to Venice AI
- User Prompt Sent to Image Generator
- Result: Harmful Content Generated
- Detection: OpenAI GPT-4O Classifies Prompt
- Injection: Harmful Prompts Rewritten to Safe Pokemon Themes
- Generation: Venice.AI Receives Safe Prompt
- Result: Safe Content Generated
# Install dependencies
pip install -r requirements.txtAdd your API keys:
# Detection & Injection: OpenAI
OPENAI_API_KEY=your_openai_key
OPENAI_BASE_URL=https://api.openai.com/v1
# Image Generation: Venice.AI
VENICE_API_KEY=your_venice_key
VENICE_BASE_URL=https://api.venice.ai/api/v1
VENICE_MODEL=qwen-image-2-propython3 kazokushield.pyFollow the Menu:
- [1] Option A: Unprotected - Direct to Venice.AI
- [2] Option B: KazokuShield (Protected) - With Protection
- [3] Exit
python3 -m uvicorn main:app --reload --port 8000curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "Generate a Violent Scene"}'| Variable | Role | Description |
|---|---|---|
OPENAI_API_KEY |
Detection/Injection | Moderation & Shielding |
OPENAI_BASE_URL |
Detection/Injection | OpenAI Endpoint |
VENICE_API_KEY |
Image Gen | Image Generation |
VENICE_BASE_URL |
Image Gen | Venice.AI Endpoint |
VENICE_MODEL |
Image Gen | Venice.AI Model (Default: qwen-image-2-pro) |
MODERATOR_MODEL |
Detection | Model for Detection (Default: gpt-4o-mini) |
SHIELD_MODEL |
Injection | Model for Prompt Injection (Default: gpt-4o-mini) |
HOST, PORT |
Server | Server Configuration |
| File | Role | Description |
|---|---|---|
main.py |
API | FastAPI Entrypoint |
src/moderator.py |
Detection | OpenAI GPT-4o Classifies Prompts as Harmful/Safe |
src/shield.py |
Injection | Rewrites Harmful Prompts to Safe Cartoon Themes |
api/generator.py |
Image Gen | Venice.AI Image Generation |
kazokushield.py |
CLI | Main Entry Point with Option A/B Workflow |
requirements.txt |
Dependencies | Python Dependencies |
Designed for educational and research purposes as a proof-of-concept for LLM Trustworthiness middleware using Positive Prompt Injection. Not intended for production use. Penn State University (PSU), CE 597-004 LLM Foundations and Trustworthiness, Spring 2026.