KazokuShield

A research middleware for LLM trustworthiness that intercepts harmful image generation prompts and transforms them into safe, cartoon-themed alternatives using Positive Prompt Injection.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                        KazokuShield Workflow                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   User Prompt                                                       │
│        │                                                            │
│        ▼                                                            │
│   ┌─────────────┐                                                   │
│   │  MODERATOR  │  Detection: OpenAI GPT-4o                        │
│   │ (Detection) │  Classifies prompt as "Harmful" or "Safe"        │
│   └──────┬──────┘                                                   │
│          │                                                            │
│     ┌────┴────┐                                                      │
│     ▼         ▼                                                      │
│  Harmful    Safe                                                     │
│     │         │                                                      │
│     ▼         ▼                                                      │
│   ┌─────────────┐                                                   │
│   │   SHIELD    │  Injection: OpenAI GPT-4o                         │
│   │(Prompt Inj) │  Rewrites harmful → safe cartoon theme            │
│   └──────┬──────┘                                                   │
│          │                                                            │
│          ▼                                                            │
│   ┌─────────────┐                                                   │
│   │  GENERATOR  │  Image Gen: Venice.AI (qwen-image-2-pro)         │
│   │(Image Gen)  │  Receives injected (safe) prompt                 │
│   └─────────────┘                                                   │
│        │                                                            │
│        ▼                                                            │
│    Safe Image                                                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Two Workflow Options

Option A: Unprotected

Direct to Venice AI
User Prompt Sent to Image Generator
Result: Harmful Content Generated

Option B: KazokuShield (Protected)

Detection: OpenAI GPT-4O Classifies Prompt
Injection: Harmful Prompts Rewritten to Safe Pokemon Themes
Generation: Venice.AI Receives Safe Prompt
Result: Safe Content Generated

Quick Start

1. Setup Environment

# Install dependencies
pip install -r requirements.txt

2. Configure .env

Add your API keys:

# Detection & Injection: OpenAI
OPENAI_API_KEY=your_openai_key
OPENAI_BASE_URL=https://api.openai.com/v1

# Image Generation: Venice.AI
VENICE_API_KEY=your_venice_key
VENICE_BASE_URL=https://api.venice.ai/api/v1
VENICE_MODEL=qwen-image-2-pro

3. Run KazokuShield

python3 kazokushield.py

Follow the Menu:

[1] Option A: Unprotected - Direct to Venice.AI
[2] Option B: KazokuShield (Protected) - With Protection
[3] Exit

4. Run API Server (Optional)

python3 -m uvicorn main:app --reload --port 8000

5. Test API Endpoint (Optional)

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Generate a Violent Scene"}'

Environment Variables

Variable	Role	Description
`OPENAI_API_KEY`	Detection/Injection	Moderation & Shielding
`OPENAI_BASE_URL`	Detection/Injection	OpenAI Endpoint
`VENICE_API_KEY`	Image Gen	Image Generation
`VENICE_BASE_URL`	Image Gen	Venice.AI Endpoint
`VENICE_MODEL`	Image Gen	Venice.AI Model (Default: `qwen-image-2-pro`)
`MODERATOR_MODEL`	Detection	Model for Detection (Default: `gpt-4o-mini`)
`SHIELD_MODEL`	Injection	Model for Prompt Injection (Default: `gpt-4o-mini`)
`HOST`, `PORT`	Server	Server Configuration

File Descriptions

File	Role	Description
`main.py`	API	FastAPI Entrypoint
`src/moderator.py`	Detection	OpenAI GPT-4o Classifies Prompts as Harmful/Safe
`src/shield.py`	Injection	Rewrites Harmful Prompts to Safe Cartoon Themes
`api/generator.py`	Image Gen	Venice.AI Image Generation
`kazokushield.py`	CLI	Main Entry Point with Option A/B Workflow
`requirements.txt`	Dependencies	Python Dependencies

Academic Purpose

Designed for educational and research purposes as a proof-of-concept for LLM Trustworthiness middleware using Positive Prompt Injection. Not intended for production use. Penn State University (PSU), CE 597-004 LLM Foundations and Trustworthiness, Spring 2026.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.kilo		.kilo
.vscode		.vscode
api		api
src		src
.gitignore		.gitignore
README.md		README.md
kazokushield.py		kazokushield.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KazokuShield

Architecture Overview

Two Workflow Options

Option A: Unprotected

Option B: KazokuShield (Protected)

Quick Start

1. Setup Environment

2. Configure .env

3. Run KazokuShield

4. Run API Server (Optional)

5. Test API Endpoint (Optional)

Environment Variables

File Descriptions

Academic Purpose

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KazokuShield

Architecture Overview

Two Workflow Options

Option A: Unprotected

Option B: KazokuShield (Protected)

Quick Start

1. Setup Environment

2. Configure .env

3. Run KazokuShield

4. Run API Server (Optional)

5. Test API Endpoint (Optional)

Environment Variables

File Descriptions

Academic Purpose

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages