Ideology-Search

A domain-specific “Ideologies” search engine featuring a high-performance Python-based asyncio/aiohttp crawler that gathered & deduplicated 100K+ English pages, extracted metadata & hyperlink graphs for an inverted TF-IDF, PageRank & HITS index. This project develops the full crawling → indexing → UI pipeline and benchmarks its semantic precision & contextual relevance against Google & Bing.

Features

Asyncio/Aiohttp Crawler: High-performance web crawler for large-scale data collection.
Deduplication: Cleans and deduplicates crawled data.
Metadata & Hyperlink Extraction: Builds metadata and hyperlink graphs for downstream indexing.
Indexing: Inverted index using TF-IDF, PageRank, and HITS algorithms (scikit-learn based).
Full-stack Pipeline: Seamless integration from crawling to search UI.
Semantic Benchmarking: Compare and benchmark results against Google and Bing for precision and relevance.

Architecture

frontend/ (React)
   |
   v
backend/ (Flask, Asyncio, Sklearn, Crawler)
   |
   v
Crawling → Deduplication → Indexing → Search API → UI

Backend Setup

The backend is a Python (Flask) app with async crawling and indexing.

Prerequisites

Python 3.8+
pip

Installation

Navigate to the backend directory:
```
cd backend
```

(Optional but recommended) Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python dependencies:
```
pip install -r requirements.txt
```
Requirements include:
- flask
- flask-cors
- scikit-learn
- requests
- beautifulsoup4
- googlesearch-python
- serpapi

Running the Backend Server

python app.py

The backend will start (by default) at http://127.0.0.1:5000.

Frontend Setup

The frontend is built using React.

Prerequisites

Node.js (v18+ recommended)
npm (comes with Node.js)

Installation

Navigate to the frontend directory:
```
cd frontend
```
Install frontend dependencies:
```
npm install
```

Running the Frontend

npm start

This will start the frontend (typically at http://localhost:3000) and proxy API requests to the backend.

Usage

Start the backend server (see above).
Start the frontend UI (see above).
Open your browser at http://localhost:3000 to use the Ideology search engine.

Project Structure

backend/
  ├── app.py               # Flask API server
  ├── crawler.py           # Asyncio/Aiohttp web crawler
  ├── conver_pkl.py        # Utility for pickle conversion
  ├── prepare_sklearn_index.py # Index preparation scripts
  ├── search_engine.py     # Main search engine logic
  ├── sklearn_indexer.py   # Sklearn-based indexing
  ├── requirements.txt     # Python dependencies
  └── terms.json           # Domain-specific terms

frontend/
  ├── README.md
  ├── package.json         # React/NPM dependencies
  ├── public/
  └── src/                 # React source code

Final Result

Below is a screenshot of the search engine UI and results in action:

License

This project is MIT license. Please contact the repository owner for usage terms.

Author

ArgonArnav

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
IR project.docx		IR project.docx
LICENSE		LICENSE
README.md		README.md
ss.png		ss.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ideology-Search

Table of Contents

Features

Architecture

Backend Setup

Prerequisites

Installation

Running the Backend Server

Frontend Setup

Prerequisites

Installation

Running the Frontend

Usage

Project Structure

Final Result

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ideology-Search

Table of Contents

Features

Architecture

Backend Setup

Prerequisites

Installation

Running the Backend Server

Frontend Setup

Prerequisites

Installation

Running the Frontend

Usage

Project Structure

Final Result

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages