Image Captioning and Label Detection Cloud Function

This project is a Cloud Function-based image analysis app that:

Uploads an image via a frontend React UI.
Detects objects/labels in the image using Google Cloud Vision API.
Generates a natural caption for the image using Google Gemini LLM.
Stores results (filename, labels, caption, timestamp) in Firestore.
Displays uploaded images, labels, and captions on the frontend.

Demo

Example of uploaded image with labels and caption.

Features

Image Upload: Drag & drop or select images.
Label Detection: Uses Google Cloud Vision API to detect objects.
LLM Caption Generation: Uses Gemini API (gemini-1.5-flash) to generate a natural caption.
Cloud Storage: Stores results in Firestore for future reference.
CORS Support: Handles requests from any frontend.

Tech Stack

Frontend: React, TailwindCSS, Axios, Lucide icons
Backend: Python, Flask, Google Cloud Functions (Gen2)
APIs: Google Cloud Vision, Google Firestore, Google Gemini (Generative AI)
Deployment: Google Cloud Functions (Gen2), Artifact Registry

Project Structure

image-function/
│
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   └── UploadImage.jsx
│   │   └── main.jsx
│   └── package.json
│
├── backend/
│   ├── main.py
│   └── requirements.txt
│
└── README.md

Setup and Deployment

1) Install gcloud CLI

# Check version
gcloud --version

# Update components
gcloud components update

2) Enable required APIs

gcloud services enable cloudfunctions.googleapis.com
gcloud services enable artifactregistry.googleapis.com
gcloud services enable vision.googleapis.com
gcloud services enable firestore.googleapis.com

3) Create Artifact Registry (skip if already exists)

gcloud artifacts repositories create gcf-artifacts \
  --repository-format=docker \
  --location=us-central1 \
  --description="Artifact Registry for Cloud Functions Gen2"

4) Set environment variable

# Replace YOUR_GEMINI_API_KEY with your key
export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"

Frontend

UploadImage.jsx handles file selection, upload, and displays results.

Axios POST request to the Cloud Function URL:

const res = await axios.post(FUNCTION_URL, formData, {
  headers: { "Content-Type": "multipart/form-data" },
});

Displays:

<p>Caption: {newImage.caption}</p>
<p>Labels: {newImage.labels.join(', ')}</p>

Backend

main.py uses Flask and Google Cloud Functions (Gen2).

Calls Google Vision API for labels.
Calls Gemini API for caption.
Saves results in Firestore.
Handles CORS and preflight requests.

Firestore document example:

doc = {
    "filename": filename,
    "labels": labels,
    "caption": caption,
    "timestamp": datetime.datetime.utcnow().isoformat()
}
db.collection("image_results").add(doc)

Environment Variables

GEMINI_API_KEY: API key for Google Gemini (Generative AI) used for caption generation.

Other credentials (Vision, Firestore) should be configured via Google Cloud IAM and service account bindings for the Cloud Function runtime.

Firestore Structure

image_results (collection)
├── <auto-generated-doc-id>
    ├── filename: "example.jpg"
    ├── labels: ["Mountain", "Sky", ...]
    ├── caption: "A beautiful snowy mountain under a blue sky."
    └── timestamp: "2025-10-01T14:03:42.964959"

Usage

# In the frontend directory
npm install
npm run dev

Select an image → Upload → Wait for result.

Labels and caption appear below the image.

Troubleshooting

CORS errors: Ensure the Cloud Function sends appropriate CORS headers for OPTIONS and POST.
Authentication/permission issues: Verify the function's service account has access to Vision API and Firestore.
Gemini errors: Confirm GEMINI_API_KEY is set and the model name is correct.
Firestore writes failing: Check Firestore rules and project/collection names.

Future Improvements

Add image thumbnails and gallery view with pagination.
Allow multiple file uploads and batch processing.
Add confidence scores and top-N label filtering.
Improve prompt engineering for richer captions.
Add retry/backoff for transient API errors.

Project Author: Snehala A
Date: October 2025

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
frontend		frontend
.gitignore		.gitignore
README.md		README.md
deploy_backend.ps1		deploy_backend.ps1
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Captioning and Label Detection Cloud Function

Table of Contents

Demo

Features

Tech Stack

Project Structure

Setup and Deployment

1) Install gcloud CLI

2) Enable required APIs

3) Create Artifact Registry (skip if already exists)

4) Set environment variable

Frontend

Backend

Environment Variables

Firestore Structure

Usage

Troubleshooting

Future Improvements

About

Uh oh!

Releases

Packages

Languages

snehala24/CaptionCloud

Folders and files

Latest commit

History

Repository files navigation

Image Captioning and Label Detection Cloud Function

Table of Contents

Demo

Features

Tech Stack

Project Structure

Setup and Deployment

1) Install gcloud CLI

2) Enable required APIs

3) Create Artifact Registry (skip if already exists)

4) Set environment variable

Frontend

Backend

Environment Variables

Firestore Structure

Usage

Troubleshooting

Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages