Skip to content

vemulasravan6/PythonWebSearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PythonWebSearch

Simple and basic search engine in python for small scale websites


Welcome to the PythonWebSearch wiki!

Developed in/for Linux distributions

SYS Requirements:

Ubuntu

Python installation

sudo apt-get install python3 Pip installation # to install python packages

easy_install pip Installing lxml

apt-get install libxml2-dev libxslt1-dev python-dev apt-get install python-lxml Installing required modules

pip install requests


To run this tool

Navigate to crawlers directory

open sites_to_crawl.txt and update website list to crawl, i've added 2 sites

Crawling & indexing

/var/bin/python crawl.py

You shud see the following msg on the console, upon successful completion

Finished Crawling.. Finished Indexing..

Raw dumps will get stored in html_dumps dir

indexed data can be found in search_index.json

To search for hotel keyword across websites

python search.py hotel

About

Simple and basic search engine in python for small scale websites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages