A lightweight, Python-based web crawler designed to recursively scan and analyze websites for data extraction, link discovery, and basic site structure mapping. Ideal for cybersecurity assessments, educational purposes, or as a foundational component for more advanced data mining tools.
✅ Standard Library • urllib.parse – for parsing and joining URLs
📦 Third-Party Libraries (must be installed via pip) • requests – for sending HTTP requests • beautifulsoup4 – for parsing and navigating HTML
- Recursive URL crawling with depth control
- Duplicate URL detection and prevention
- Configurable user-agent header
- Simple and readable console output
- Lightweight and dependency-free (built on Python standard libraries)
crawler/
│
├── crawler.py # Main crawler script
├── requirements.txt # (Optional) List of dependencies (currently none)
├── README.md # This file
⚙️ Installation
1. Clone the repository:
git clone https://github.com/DanielVihorev/Crawler.git
cd Crawler
2.(Optional) Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
3.Install requirements:
This project uses only Python standard libraries. If future dependencies are added, use:
pip install -r requirements.txt
🚀 Usage
Run the crawler script from the command line:
python crawler.py
You may edit the script to modify the starting URL or crawling depth.
🛠️ Customization
To change the starting URL or behavior:
• Modify the start_url and max_depth variables in crawler.py
• Enhance with modules like BeautifulSoup or Scrapy if HTML parsing is needed
• Integrate logging or export results to a file or database
🧠 Use Cases
• Cybersecurity Reconnaissance – Discover site structure during initial scans
• SEO Audits – Map out links and page connections
• Learning Tool – Great starting point to understand how crawlers work
⚖️ License
This project is licensed under the MIT License – see the LICENSE file for details.
👨💻 Author
Daniel Vihorev
Cybersecurity enthusiast & Python developer
GitHub Profile
Ilay Zendani
Cybersecurity enthusiast & Python developer
GitHub Profile
⸻
🧠 “Built from curiosity, crafted for exploration.”
---
Let me know if you want to add badges, Docker support, or split functionality for CLI args!
@All rights saved to Daniel Vihorev and Ilay zendani (Wild Life Cyber Security)