Skip to content

Semantica lets you upload any PDF, convert it into searchable chunks, and retrieve answers with semantic precision.

License

Notifications You must be signed in to change notification settings

oguzsh/semantica

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Semantica

Semantica is a lightweight semantic search engine for PDF documents. It processes PDF files, converts them into vectorized text chunks, and enables intelligent retrieval using embedding-based similarity search — all without using an LLM.


🚀 Features

  • 📄 Upload any PDF file
  • ✂️ Automatic chunking of document content
  • 🔢 Embedding with HuggingFace (MiniLM)
  • 🧠 Vector search with Qdrant
  • ⚡ Fast and local — no OpenAI API required
  • 📆 Built with FastAPI, LangChain, and Qdrant

🖥️ Tech Stack

Layer Tool
Backend FastAPI
Parsing pymupdf4llm
Chunking LangChain MarkdownTextSplitter
Embedding sentence-transformers/all-MiniLM-L6-v2
Vector DB Qdrant via Docker

🔧 Setup & Run

1. Clone the repository

git clone https://github.com/yourname/semantica.git
cd semantica

2. Install dependencies

pip install -r requirements.txt

3. Run Qdrant locally (Docker)

docker run -p 6333:6333 qdrant/qdrant

4. Start the FastAPI server

fastapi dev main

Then open the Swagger UI at: 📍 http://localhost:8000/docs


🦪 API Endpoints

POST /upload

Uploads and parses a PDF file. Chunks it and saves to Qdrant with embeddings.

POST /search

Send a semantic query and receive relevant chunks. Example request:

{
  "query": "Does this PDF mention 'fun' keyword?"
}

Example response:

[
  {
    "score": 0.92,
    "text": "This is a simple PDF file. Fun fun fun.",
    "source_file": "sample.pdf",
    "chunk_id": 1
  }
]

📜 Future Plans

  • LLM-based answer generation
  • Multi-document support
  • Frontend interface for document search (possibly separated)

🤝 Contributing

Pull requests, feedback and ideas are always welcome. If you use this project, feel free to ⭐️ the repo and share your feedback.


📄 License

MIT

About

Semantica lets you upload any PDF, convert it into searchable chunks, and retrieve answers with semantic precision.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages