Skip to content

samiur-r/ImageSemantics

Repository files navigation

Image Semantics

A Next.js + MongoDB + OpenAI project for semantically grouping image assets into categories, using two complementary approaches:

  1. Image Embedding Clustering: Generate vector embeddings directly from images using OpenAI CLIP and group them by similarity.
  2. Metadata Embedding Clustering: Use OpenAI GPT-4.1-nano to generate textual metadata (title, description, tags) for each image, embed that metadata, and then group by similarity.

Both approaches use k‑means on unit‑normalized vectors to approximate cosine‑based clustering.


Table of Contents

  • Features
  • Prerequisites
  • Installation
  • Configuration
  • Image Embedding Approach
  • Metadata Embedding Approach
  • Frontend Integration
  • Available Scripts

Features

  • Two semantic grouping pipelines: raw image embeddings vs. LLM‑assisted metadata embeddings
  • Clustering via k‑means on normalized vectors for cosine similarity
  • Next.js App Router API routes for dynamic grouping
  • Prisma ORM with MongoDB for asset management
  • Simple scripts for metadata generation and embedding

Prerequisites

  • Node.js 18 or higher
  • MongoDB Atlas (or local MongoDB instance)
  • OpenAI API access (GPT‑4 Vision or GPT‑4)

Installation

  1. Clone the repository
  2. Install dependencies via your package manager
  3. Set up database migrations

Configuration

Create a .env file in the project root to specify your MongoDB connection string and OpenAI API key.


Image Embedding Approach

  1. Generate CLIP embeddings for each image and store them in the database.
  2. Use an API route to fetch all image embeddings, normalize them, and run k‑means.
  3. Return groups of assets based on their cluster assignments.

Metadata Embedding Approach

  1. Generate descriptive metadata for each image using OpenAI GPT-4.1-nano (title, description, tags).
  2. Store the metadata alongside each asset and create text embeddings for that metadata using OpenAI text-embedding-3-small.
  3. Use an API route to fetch metadata embeddings, normalize them, and run k‑means.
  4. Return asset groups based on metadata similarity.

Frontend Integration

On the frontend, fetch the grouping API endpoint and iterate over each group to render sections or carousels of assets.


Available Scripts

  • Generate Metadata: Run the metadata generation script to populate image metadata via LLM.
  • Embed Metadata: Run the embedding script to create text embeddings from metadata.
  • Cluster Images: Use the built‑in API route to cluster by image embeddings.
  • Cluster Metadata: Use the built‑in API route to cluster by metadata embeddings.