This project is a custom-built search engine web application that performs real-time web scraping across multiple search engines (Google, Bing, DuckDuckGo, Yahoo), stores search query data in a SQL Server database, and presents results in an organized and user-friendly interface.
- Real-time query scraping using Selenium and BeautifulSoup
- Data stored in a SQL Server database with structured tables for queries, URLs, and search term frequencies
- Dynamic Flask front-end to display URLs by relevance
- Duplicate filtering and frequency aggregation of search terms
- Integrated search functionality from both the home and results pages
- Python, Flask
- SQL Server (T-SQL)
- Selenium, BeautifulSoup
- HTML/CSS
- Jupyter Notebook
-
Set Up the Database
- Run the SQL file
Custom Bot Database Query v2.0.sqlin SQL Server Management Studio to initialize the schema.
- Run the SQL file
-
Set Up the Environment
-
Install required libraries:
pip install flask selenium pyodbc nltk beautifulsoup4
-
Download NLTK corpora:
import nltk nltk.download('stopwords') nltk.download('punkt')
-
-
Start the App
- Run the Flask application from your terminal:
python webscraping.py
- Go to
http://127.0.0.1:5000/in your browser.
- Run the Flask application from your terminal:
├── webscraping.py
├── app.ipynb
├── Custom Bot Database Query v2.0.sql
├── static/
│ ├── styles.css
│ └── logo.png
├── templates/
│ ├── index.html
│ └── search_results.html
├── presentation/
│ └── Project Presentation.pptx