Skip to content

An AI-powered web application that analyzes uploaded images to detect objects and generate natural, human-like captions using Google Cloud Vision API and Gemini LLM. Users can quickly get insights and descriptive summaries for any image in real time.

Notifications You must be signed in to change notification settings

snehala24/CaptionCloud

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Captioning and Label Detection Cloud Function

This project is a Cloud Function-based image analysis app that:

  • Uploads an image via a frontend React UI.
  • Detects objects/labels in the image using Google Cloud Vision API.
  • Generates a natural caption for the image using Google Gemini LLM.
  • Stores results (filename, labels, caption, timestamp) in Firestore.
  • Displays uploaded images, labels, and captions on the frontend.

Table of Contents


Demo

UI Demo Example of uploaded image with labels and caption.


Features

  • Image Upload: Drag & drop or select images.
  • Label Detection: Uses Google Cloud Vision API to detect objects.
  • LLM Caption Generation: Uses Gemini API (gemini-1.5-flash) to generate a natural caption.
  • Cloud Storage: Stores results in Firestore for future reference.
  • CORS Support: Handles requests from any frontend.

Tech Stack

  • Frontend: React, TailwindCSS, Axios, Lucide icons
  • Backend: Python, Flask, Google Cloud Functions (Gen2)
  • APIs: Google Cloud Vision, Google Firestore, Google Gemini (Generative AI)
  • Deployment: Google Cloud Functions (Gen2), Artifact Registry

Project Structure

image-function/
│
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   └── UploadImage.jsx
│   │   └── main.jsx
│   └── package.json
│
├── backend/
│   ├── main.py
│   └── requirements.txt
│
└── README.md

Setup and Deployment

1) Install gcloud CLI

# Check version
gcloud --version

# Update components
gcloud components update

2) Enable required APIs

gcloud services enable cloudfunctions.googleapis.com
gcloud services enable artifactregistry.googleapis.com
gcloud services enable vision.googleapis.com
gcloud services enable firestore.googleapis.com

3) Create Artifact Registry (skip if already exists)

gcloud artifacts repositories create gcf-artifacts \
  --repository-format=docker \
  --location=us-central1 \
  --description="Artifact Registry for Cloud Functions Gen2"

4) Set environment variable

# Replace YOUR_GEMINI_API_KEY with your key
export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"

Frontend

UploadImage.jsx handles file selection, upload, and displays results.

Axios POST request to the Cloud Function URL:

const res = await axios.post(FUNCTION_URL, formData, {
  headers: { "Content-Type": "multipart/form-data" },
});

Displays:

<p>Caption: {newImage.caption}</p>
<p>Labels: {newImage.labels.join(', ')}</p>

Backend

main.py uses Flask and Google Cloud Functions (Gen2).

  • Calls Google Vision API for labels.
  • Calls Gemini API for caption.
  • Saves results in Firestore.
  • Handles CORS and preflight requests.

Firestore document example:

doc = {
    "filename": filename,
    "labels": labels,
    "caption": caption,
    "timestamp": datetime.datetime.utcnow().isoformat()
}
db.collection("image_results").add(doc)

Environment Variables

  • GEMINI_API_KEY: API key for Google Gemini (Generative AI) used for caption generation.

Other credentials (Vision, Firestore) should be configured via Google Cloud IAM and service account bindings for the Cloud Function runtime.


Firestore Structure

image_results (collection)
├── <auto-generated-doc-id>
    ├── filename: "example.jpg"
    ├── labels: ["Mountain", "Sky", ...]
    ├── caption: "A beautiful snowy mountain under a blue sky."
    └── timestamp: "2025-10-01T14:03:42.964959"

Usage

# In the frontend directory
npm install
npm run dev

Select an image → Upload → Wait for result.

Labels and caption appear below the image.


Troubleshooting

  • CORS errors: Ensure the Cloud Function sends appropriate CORS headers for OPTIONS and POST.
  • Authentication/permission issues: Verify the function's service account has access to Vision API and Firestore.
  • Gemini errors: Confirm GEMINI_API_KEY is set and the model name is correct.
  • Firestore writes failing: Check Firestore rules and project/collection names.

Future Improvements

  • Add image thumbnails and gallery view with pagination.
  • Allow multiple file uploads and batch processing.
  • Add confidence scores and top-N label filtering.
  • Improve prompt engineering for richer captions.
  • Add retry/backoff for transient API errors.

Project Author: Snehala A
Date: October 2025

About

An AI-powered web application that analyzes uploaded images to detect objects and generate natural, human-like captions using Google Cloud Vision API and Gemini LLM. Users can quickly get insights and descriptive summaries for any image in real time.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published