Skip to content

omrh2323/AyBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AyBot πŸ•ΈοΈ

AyBot is a simple and modular asynchronous web crawler written in Python.
It is designed to collect web pages, extract useful content and store data in a local SQLite or MySQL database.
This project is part of a personal search engine experiment.


✨ Features

  • Asynchronous crawling with aiohttp
  • HTML parsing with BeautifulSoup
  • Basic link extraction and filtering
  • Language detection with langdetect
  • Spam keyword detection
  • Robots.txt and sitemap support (basic)
  • Dual storage: MySQL for metadata, SQLite for content
  • Lightweight and easy to understand structure

πŸ“ Project Structure

AyBot/
β”œβ”€β”€ AyBot.py
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ crawler.py
β”‚   β”œβ”€β”€ parser.py
β”‚   β”œβ”€β”€ renderer.py
β”‚   └── scheduler.py
β”œβ”€β”€ database/
β”‚   β”œβ”€β”€ mysql_handler.py
β”‚   └── sqlite_handler.py
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ config.py
β”‚   β”œβ”€β”€ helpers.py
β”‚   └── logger.py
β”œβ”€β”€ data/
β”‚   └── ayfilter_data.db

πŸ§ͺ Requirements

Install dependencies:

pip install aiohttp beautifulsoup4 langdetect mysql-connector-python psutil

▢️ How to Run

python AyBot.py

πŸ“„ License

This project is open-source under the MIT license.


🀝 Contributing

AyBot is still evolving. You can help improve it!

  • πŸ› Found a bug? Open an issue!
  • 🌍 Have a new feature idea? Suggest it!
  • 🧠 Want to improve performance or architecture? PRs are welcome!
  • πŸ“ Even improving docs is appreciated.

Before contributing, check the issues tab for open tasks or discussions.

Releases

No releases published

Packages

No packages published

Languages