This project is a Cloud Function-based image analysis app that:
- Uploads an image via a frontend React UI.
- Detects objects/labels in the image using Google Cloud Vision API.
- Generates a natural caption for the image using Google Gemini LLM.
- Stores results (filename, labels, caption, timestamp) in Firestore.
- Displays uploaded images, labels, and captions on the frontend.
- Demo
- Features
- Tech Stack
- Project Structure
- Setup and Deployment
- Frontend
- Backend
- Environment Variables
- Firestore Structure
- Usage
- Troubleshooting
- Future Improvements
Example of uploaded image with labels and caption.
- Image Upload: Drag & drop or select images.
- Label Detection: Uses Google Cloud Vision API to detect objects.
- LLM Caption Generation: Uses Gemini API (
gemini-1.5-flash) to generate a natural caption. - Cloud Storage: Stores results in Firestore for future reference.
- CORS Support: Handles requests from any frontend.
- Frontend: React, TailwindCSS, Axios, Lucide icons
- Backend: Python, Flask, Google Cloud Functions (Gen2)
- APIs: Google Cloud Vision, Google Firestore, Google Gemini (Generative AI)
- Deployment: Google Cloud Functions (Gen2), Artifact Registry
image-function/
│
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ │ └── UploadImage.jsx
│ │ └── main.jsx
│ └── package.json
│
├── backend/
│ ├── main.py
│ └── requirements.txt
│
└── README.md
# Check version
gcloud --version
# Update components
gcloud components updategcloud services enable cloudfunctions.googleapis.com
gcloud services enable artifactregistry.googleapis.com
gcloud services enable vision.googleapis.com
gcloud services enable firestore.googleapis.comgcloud artifacts repositories create gcf-artifacts \
--repository-format=docker \
--location=us-central1 \
--description="Artifact Registry for Cloud Functions Gen2"# Replace YOUR_GEMINI_API_KEY with your key
export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"UploadImage.jsx handles file selection, upload, and displays results.
Axios POST request to the Cloud Function URL:
const res = await axios.post(FUNCTION_URL, formData, {
headers: { "Content-Type": "multipart/form-data" },
});Displays:
<p>Caption: {newImage.caption}</p>
<p>Labels: {newImage.labels.join(', ')}</p>main.py uses Flask and Google Cloud Functions (Gen2).
- Calls Google Vision API for labels.
- Calls Gemini API for caption.
- Saves results in Firestore.
- Handles CORS and preflight requests.
Firestore document example:
doc = {
"filename": filename,
"labels": labels,
"caption": caption,
"timestamp": datetime.datetime.utcnow().isoformat()
}
db.collection("image_results").add(doc)GEMINI_API_KEY: API key for Google Gemini (Generative AI) used for caption generation.
Other credentials (Vision, Firestore) should be configured via Google Cloud IAM and service account bindings for the Cloud Function runtime.
image_results (collection)
├── <auto-generated-doc-id>
├── filename: "example.jpg"
├── labels: ["Mountain", "Sky", ...]
├── caption: "A beautiful snowy mountain under a blue sky."
└── timestamp: "2025-10-01T14:03:42.964959"
# In the frontend directory
npm install
npm run devSelect an image → Upload → Wait for result.
Labels and caption appear below the image.
- CORS errors: Ensure the Cloud Function sends appropriate CORS headers for
OPTIONSandPOST. - Authentication/permission issues: Verify the function's service account has access to Vision API and Firestore.
- Gemini errors: Confirm
GEMINI_API_KEYis set and the model name is correct. - Firestore writes failing: Check Firestore rules and project/collection names.
- Add image thumbnails and gallery view with pagination.
- Allow multiple file uploads and batch processing.
- Add confidence scores and top-N label filtering.
- Improve prompt engineering for richer captions.
- Add retry/backoff for transient API errors.
Project Author: Snehala A
Date: October 2025