Image Semantics

A Next.js + MongoDB + OpenAI project for semantically grouping image assets into categories, using two complementary approaches:

Image Embedding Clustering: Generate vector embeddings directly from images using OpenAI CLIP and group them by similarity.
Metadata Embedding Clustering: Use OpenAI GPT-4.1-nano to generate textual metadata (title, description, tags) for each image, embed that metadata, and then group by similarity.

Both approaches use k‑means on unit‑normalized vectors to approximate cosine‑based clustering.

Features

Two semantic grouping pipelines: raw image embeddings vs. LLM‑assisted metadata embeddings
Clustering via k‑means on normalized vectors for cosine similarity
Next.js App Router API routes for dynamic grouping
Prisma ORM with MongoDB for asset management
Simple scripts for metadata generation and embedding

Prerequisites

Node.js 18 or higher
MongoDB Atlas (or local MongoDB instance)
OpenAI API access (GPT‑4 Vision or GPT‑4)

Installation

Clone the repository
Install dependencies via your package manager
Set up database migrations

Configuration

Create a .env file in the project root to specify your MongoDB connection string and OpenAI API key.

Image Embedding Approach

Generate CLIP embeddings for each image and store them in the database.
Use an API route to fetch all image embeddings, normalize them, and run k‑means.
Return groups of assets based on their cluster assignments.

Metadata Embedding Approach

Generate descriptive metadata for each image using OpenAI GPT-4.1-nano (title, description, tags).
Store the metadata alongside each asset and create text embeddings for that metadata using OpenAI text-embedding-3-small.
Use an API route to fetch metadata embeddings, normalize them, and run k‑means.
Return asset groups based on metadata similarity.

Frontend Integration

On the frontend, fetch the grouping API endpoint and iterate over each group to render sections or carousels of assets.

Available Scripts

Generate Metadata: Run the metadata generation script to populate image metadata via LLM.
Embed Metadata: Run the embedding script to create text embeddings from metadata.
Cluster Images: Use the built‑in API route to cluster by image embeddings.
Cluster Metadata: Use the built‑in API route to cluster by metadata embeddings.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
prisma		prisma
public		public
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Semantics

Table of Contents

Features

Prerequisites

Installation

Configuration

Image Embedding Approach

Metadata Embedding Approach

Frontend Integration

Available Scripts

About

Uh oh!

Releases

Packages

Uh oh!

Languages

samiur-r/ImageSemantics

Folders and files

Latest commit

History

Repository files navigation

Image Semantics

Table of Contents

Features

Prerequisites

Installation

Configuration

Image Embedding Approach

Metadata Embedding Approach

Frontend Integration

Available Scripts

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages