ClipSearch

ClipSearch is a production-ready, asynchronous document search engine designed for high-performance indexing and AI-powered summarization of PDF and TXT files. Built with a cloud-native architecture, it leverages a distributed pipeline to process documents and provide instant, searchable insights.

Key Features

Asynchronous Processing: S3-triggered events handled via SQS and dedicated workers.
AI Summarization: Automated generation of 2-3 bullet point summaries using local LLMs (TinyLlama/Ollama).
Full-Text Search: High-performance indexing and retrieval powered by Elasticsearch.
Enterprise-Ready: Fully containerized and optimized for OpenShift/Kubernetes deployment.
Clean UI: Modern Angular-based frontend for seamless file uploads and search.

How It Works

Ingestion: Users upload PDF/TXT files via the API, which stores them in S3.
Messaging: An event is pushed to an SQS queue to trigger background processing.
Extraction: Workers pull the event, extract text using Apache Tika, and request a summary from the AI engine.
Indexing: The metadata, extracted text, and AI summary are indexed into Elasticsearch for real-time searching.

Architecture

graph LR
    User((User)) -->|Upload/Search| FE[Angular Frontend]
    FE -->|REST API| API[Quarkus API]
    API -->|1. Store| S3[(S3 Storage)]
    API -->|2. Notify| SQS[SQS Queue]
    SQS -->|3. Trigger| Worker[Quarkus Worker]
    Worker -->|4. Summarize| LLM[Ollama/TinyLlama]
    Worker -->|5. Index| ES[(Elasticsearch)]
    API -->|Search| ES

Tech Stack

Backend: Java 17, Quarkus, LangChain4j, Apache Tika.
Frontend: Angular 19, Tailwind CSS.
Infrastructure: Elasticsearch, LocalStack (S3/SQS).
AI Engine: Ollama / Red Hat OpenShift AI.
Deployment: Docker Compose, OpenShift (Kustomize).

How to Run

Local Development

Prerequisites: Docker and Docker Compose installed.
Start Services:
```
docker-compose up -d
```
Access UI: Open http://localhost:4200 in your browser.

Build from Source

# Backend
mvn -f backend/pom.xml clean package -pl api,worker -am

# Frontend
cd frontend && npm install && npm run build

Use Cases

Knowledge Management: Quick indexing of internal documentation and research papers.
Automated Summarization: Fast-tracking document review with AI-generated snippets.
Searchable Archives: Converting large volumes of static files into a searchable database.

🚀 CI/CD Pipeline

The project uses GitHub Actions (.github/workflows/ci.yml) to:

Build Java components with Maven.
Build Docker images for API, Worker, and Frontend.
Push images to ghcr.io/dawidbera/clipsearch-*.

🛠 Troubleshooting

Common Issues

405 on Uploads: Consistently use UploadResource for all upload-related logic.
Single Search Result: Ensure uploadId is used as the Elasticsearch _id to avoid duplicates or overwrites.
S3 Connectivity: Use path-style access and ensure endpoints are correctly resolved (internal vs external).
Ollama Connection: Ensure Ollama is running and accessible (check llm-standalone logs on OpenShift).

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
backend		backend
deploy		deploy
docs		docs
frontend		frontend
scripts		scripts
.gitignore		.gitignore
README.md		README.md
deploy-openshift.sh		deploy-openshift.sh
docker-compose.yml		docker-compose.yml
frontend-config.json		frontend-config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClipSearch

Key Features

How It Works

Architecture

Tech Stack

How to Run

Local Development

Build from Source

Use Cases

🚀 CI/CD Pipeline

🛠 Troubleshooting

Common Issues

Demo

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dawidbera/clipsearch

Folders and files

Latest commit

History

Repository files navigation

ClipSearch

Key Features

How It Works

Architecture

Tech Stack

How to Run

Local Development

Build from Source

Use Cases

🚀 CI/CD Pipeline

🛠 Troubleshooting

Common Issues

Demo

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages