A full-stack web application for semantic search and chunked retrieval of website content. The frontend is built with React (Vite, TypeScript), and the backend uses FastAPI, BeautifulSoup, NLTK, sentence-transformers, FAISS, and Milvus for vector search.
- Enter a website URL and search query to find the most relevant content chunks.
- Semantic search using transformer embeddings and vector similarity.
- Results are displayed in styled cards with match scores and chunk details.
- Modern, responsive UI with clear chunk grouping and search form.
- Node.js (v18+ recommended)
- Python (v3.9+ recommended)
- pip (Python package manager)
- Milvus (vector database, can run locally via Docker)
git clone https://github.com/kumarBisho/web-content-search.git
cd web-content-searchcd backend
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt- Option 1: Docker (Recommended)
- Install Docker: https://docs.docker.com/get-docker/
- Run Milvus:
docker run -d --name milvus-standalone -p 19530:19530 -p 9091:9091 milvusdb/milvus:v2.3.9
- Option 2: Local Install
- See Milvus docs: https://milvus.io/docs/install_standalone-docker.md
- Edit
.envinbackend/if needed (see.env.example). - Default Milvus connection:
localhost:19530
uvicorn main:app --reloadcd ../frontend
# Install Node.js dependencies
npm install
# (Optional) Install Python requirements if using Python scripts in frontend
pip install -r requirements.txt- Default backend API URL:
http://127.0.0.1:8000
npm run dev- Start Milvus (vector DB) and backend server.
- Start frontend dev server.
- Open the frontend in your browser (usually http://localhost:5173).
- Enter a website URL and search query, then click "Search".
- View top matching content chunks with semantic relevance scores.
- fastapi
- uvicorn
- beautifulsoup4
- nltk
- sentence-transformers
- faiss-cpu
- pymilvus
- python-dotenv
- react
- vite
- typescript
- axios
- Ensure Milvus is running before starting the backend.
- You may need to download NLTK data (handled automatically in code).
- For production, set proper CORS and environment variables.
- See
.env.examplefiles for configuration templates.
MIT


