A Next.js + MongoDB + OpenAI project for semantically grouping image assets into categories, using two complementary approaches:
- Image Embedding Clustering: Generate vector embeddings directly from images using OpenAI CLIP and group them by similarity.
- Metadata Embedding Clustering: Use OpenAI GPT-4.1-nano to generate textual metadata (title, description, tags) for each image, embed that metadata, and then group by similarity.
Both approaches use k‑means on unit‑normalized vectors to approximate cosine‑based clustering.
- Features
- Prerequisites
- Installation
- Configuration
- Image Embedding Approach
- Metadata Embedding Approach
- Frontend Integration
- Available Scripts
- Two semantic grouping pipelines: raw image embeddings vs. LLM‑assisted metadata embeddings
- Clustering via k‑means on normalized vectors for cosine similarity
- Next.js App Router API routes for dynamic grouping
- Prisma ORM with MongoDB for asset management
- Simple scripts for metadata generation and embedding
- Node.js 18 or higher
- MongoDB Atlas (or local MongoDB instance)
- OpenAI API access (GPT‑4 Vision or GPT‑4)
- Clone the repository
- Install dependencies via your package manager
- Set up database migrations
Create a .env
file in the project root to specify your MongoDB connection string and OpenAI API key.
- Generate CLIP embeddings for each image and store them in the database.
- Use an API route to fetch all image embeddings, normalize them, and run k‑means.
- Return groups of assets based on their cluster assignments.
- Generate descriptive metadata for each image using OpenAI GPT-4.1-nano (title, description, tags).
- Store the metadata alongside each asset and create text embeddings for that metadata using OpenAI text-embedding-3-small.
- Use an API route to fetch metadata embeddings, normalize them, and run k‑means.
- Return asset groups based on metadata similarity.
On the frontend, fetch the grouping API endpoint and iterate over each group to render sections or carousels of assets.
- Generate Metadata: Run the metadata generation script to populate image metadata via LLM.
- Embed Metadata: Run the embedding script to create text embeddings from metadata.
- Cluster Images: Use the built‑in API route to cluster by image embeddings.
- Cluster Metadata: Use the built‑in API route to cluster by metadata embeddings.