AyBot is a simple and modular asynchronous web crawler written in Python.
It is designed to collect web pages, extract useful content and store data in a local SQLite or MySQL database.
This project is part of a personal search engine experiment.
- Asynchronous crawling with
aiohttp - HTML parsing with
BeautifulSoup - Basic link extraction and filtering
- Language detection with
langdetect - Spam keyword detection
- Robots.txt and sitemap support (basic)
- Dual storage: MySQL for metadata, SQLite for content
- Lightweight and easy to understand structure
AyBot/
βββ AyBot.py
βββ core/
β βββ crawler.py
β βββ parser.py
β βββ renderer.py
β βββ scheduler.py
βββ database/
β βββ mysql_handler.py
β βββ sqlite_handler.py
βββ utils/
β βββ config.py
β βββ helpers.py
β βββ logger.py
βββ data/
β βββ ayfilter_data.db
Install dependencies:
pip install aiohttp beautifulsoup4 langdetect mysql-connector-python psutilpython AyBot.pyThis project is open-source under the MIT license.
AyBot is still evolving. You can help improve it!
- π Found a bug? Open an issue!
- π Have a new feature idea? Suggest it!
- π§ Want to improve performance or architecture? PRs are welcome!
- π Even improving docs is appreciated.
Before contributing, check the issues tab for open tasks or discussions.