A GPU-accelerated, Dockerized microservice that enables text-to-image generation using state-of-the-art Stable Diffusion models. The system features a gRPC API with REST wrapper and a Streamlit-based web interface for seamless user interaction.
- β
Local GPU inference using Hugging Face's
diffuserspipeline - β gRPC-based communication with protobuf definitions
- β REST API wrapper as optional for postman testing
- β Images returned as base64 and saved as PNG files
- β Streamlit frontend with intuitive prompt UI
- β Concurrent request handling with async processing
- β Comprehensive error handling and status codes
- β Dockerized for easy deployment and reproducibility
.
βββ server.py # gRPC server using diffusers and Stable Diffusion
βββ rest_api.py # gRPC wrapper (optional fallback interface)
βββ FrontEnd/
β βββ streamlit_app.py # Streamlit UI client that calls gRPC backend
βββ text2image.proto # Protocol Buffers definition
βββ text2image_pb2.py # Generated by protoc (DO NOT EDIT)
βββ text2image_pb2_grpc.py # Generated by protoc (DO NOT EDIT)
βββ Dockerfile # Full container setup
βββ requirements.txt # Python dependencies
βββ tests/ # Test suite for performance evaluation
β βββ test_concurrent.py # Concurrent request tests
β βββ performance_tests.py # Performance evaluation scripts
βββ generated_images/ # Directory where generated images are saved
git clone https://github.com/yourusername/text2image-grpc-app.git
cd text2image-grpc-appMake sure protoc is installed.
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. text2image.protopython -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txtEnsure you have:
- Docker installed
- NVIDIA Container Toolkit set up correctly
# Build the container
docker build -t text2image-grpc .
# Run with GPU support
docker run --gpus all -p 50051:50051 -p 8501:8501 text2image-grpcMake sure you have a GPU with CUDA installed.
- Run gRPC Server
python server.py- Run Streamlit UI (in another terminal)
streamlit run FrontEnd/streamlit_app.py- Open http://localhost:8501 in your browser
- Enter a text prompt (e.g., "a wizard casting fire in the sky")
- Optionally add context for better results (e.g., "fantasy art")
- Click Generate Image
- View the result and find the saved PNG in the
generated_images/folder
Example Python client code:
import grpc
import text2image_pb2
import text2image_pb2_grpc
import base64
from PIL import Image
import io
# Setup gRPC channel
channel = grpc.insecure_channel('localhost:50051')
stub = text2image_pb2_grpc.Text2ImageServiceStub(channel)
# Create request
request = text2image_pb2.ImageRequest(
context="fantasy art",
text="a dragon flying over a medieval castle"
)
# Make the call
response = stub.GenerateImage(request)
# Process the response
if response.status_code == 200:
# Convert base64 to image and display or save
image_data = base64.b64decode(response.image_base64)
image = Image.open(io.BytesIO(image_data))
image.show()
print(f"Image also saved to: {response.image_path}")
else:
print(f"Error: {response.message}")If you prefer REST over gRPC:
# First run the REST wrapper
python rest_api.py
# Then make REST requests
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{"context": "cyberpunk", "text": "a futuristic city at night"}'Response:
{
"status_code": 200,
"message": "Image generated successfully",
"image_base64": "base64_encoded_string_here...",
"image_path": "generated_images/cyberpunk_futuristic_city_20250504_123456.png"
}ββββββββββββββ ββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββββ
β Streamlit βββββββββΆβ gRPC API βββββββββΆβ REST API βββββββββΆβ Diffusers + PyTorch β
ββββββββββββββ ββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββββ
β² β² β² β²
β β β β
βΌ βΌ βΌ βΌ
Web Browser gRPC Client REST Client Dockerized GPU
- Frontend: Streamlit web UI for interactive image generation
- API Layer: gRPC Service with Protocol Buffers (primary) and REST API wrapper (secondary)
- Core Service: Asynchronous request handler with concurrent processing support
- Model: Hugging Face's diffusers.StableDiffusionPipeline
- Inference: PyTorch with CUDA GPU acceleration
- Deployment: Containerized with Docker for consistent environment
The service exposes a GenerateImage RPC method defined in the Protocol Buffer:
syntax = "proto3";
service Text2ImageService {
rpc GenerateImage (ImageRequest) returns (ImageResponse);
}
message ImageRequest {
string context = 1; // Optional context for better image generation
string text = 2; // The main prompt for image generation
}
message ImageResponse {
int32 status_code = 1; // HTTP-like status code (200 = success)
string message = 2; // Status message or error description
string image_base64 = 3; // Base64-encoded image data
string image_path = 4; // Path where the image was saved
}The microservice relies on the following key components:
torch>=2.0.0
diffusers>=0.20.0
transformers>=4.30.0
protobuf>=4.23.0
grpcio>=1.54.0
grpcio-tools>=1.54.0
streamlit>=1.23.0
Pillow>=9.5.0
numpy>=1.24.0
asyncio>=3.4.3The application uses:
- StableDiffusionPipeline from π€ Diffusers
- transformers for CLIP tokenizer/model
- DPMSolverMultistepScheduler for faster inference
Server implementation uses asynchronous processing to handle concurrent requests:
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch
import grpc
import asyncio
import concurrent.futures
class Text2ImageService(text2image_pb2_grpc.Text2ImageServiceServicer):
def __init__(self):
# Load model with GPU acceleration
self.pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
self.pipe.scheduler = DPMSolverMultistepScheduler.from_config(self.pipe.scheduler.config)
self.pipe = self.pipe.to("cuda")
# Thread pool for handling concurrent requests
self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=3)
async def GenerateImage(self, request, context):
# Asynchronous implementation for handling concurrent requests
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
self.executor,
self._generate_image_sync,
request,
context
)
def _generate_image_sync(self, request, context):
# Actual image generation logic
# [...]The project includes comprehensive test suites to evaluate performance and reliability:
Tests verify the service can handle multiple simultaneous requests:
# Run concurrent request tests
python tests/test_concurrent.pyPerformance metrics measuring response times under various loads:
| Concurrent Requests | Avg. Response Time (s) | Success Rate (%) |
|---|---|---|
| 1 | 2.3 | 100 |
| 5 | 3.8 | 100 |
| 10 | 6.5 | 98 |
| 20 | 12.1 | 95 |
The service implements robust error handling:
- Invalid input validation
- GPU memory monitoring
- Timeouts for long-running requests
- Graceful degradation under heavy load
- π§ Memory intensive: Requires >= 6 GB VRAM for Stable Diffusion 1.5
- π No user authentication or rate limiting implemented
- π No image caching (each prompt triggers new inference)
- π Cold start time when loading large models in container (~30s)
- πΌοΈ Limited to single image generation (no batch processing)
Problem: gRPC connection refused
Cause: Server not running or port conflict
Fix: Verify server is running and port 50051 is available
Problem: module 'torch' has no attribute 'compiler'
Cause: Version mismatch between PyTorch and transformers
Fix: Use compatible versions:
pip install torch==2.1.0 transformers==4.36.2 diffusers==0.25.0Problem: CUDA out of memory error
Cause: GPU memory exhaustion from concurrent requests
Fix: Reduce concurrent request limit in server.py or increase available VRAM
- Add user authentication and API keys
- Implement request rate limiting
- Add image caching for duplicate prompts
- Support batch processing of multiple prompts
- Add more Stable Diffusion model options
- Implement model quantization for reduced memory usage