🕷️ Web Crawler

A lightweight, Python-based web crawler designed to recursively scan and analyze websites for data extraction, link discovery, and basic site structure mapping. Ideal for cybersecurity assessments, educational purposes, or as a foundational component for more advanced data mining tools.

List of External and Third Party Libraries

✅ Standard Library • urllib.parse – for parsing and joining URLs

📦 Third-Party Libraries (must be installed via pip) • requests – for sending HTTP requests • beautifulsoup4 – for parsing and navigating HTML

🔍 Features

Recursive URL crawling with depth control
Duplicate URL detection and prevention
Configurable user-agent header
Simple and readable console output
Lightweight and dependency-free (built on Python standard libraries)

📁 Project Structure

crawler/
│
├── crawler.py          # Main crawler script
├── requirements.txt    # (Optional) List of dependencies (currently none)
├── README.md           # This file

⚙️ Installation
	1.	Clone the repository:

git clone https://github.com/DanielVihorev/Crawler.git
cd Crawler


2.(Optional) Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3.Install requirements:
This project uses only Python standard libraries. If future dependencies are added, use:

pip install -r requirements.txt



🚀 Usage

Run the crawler script from the command line:

python crawler.py

You may edit the script to modify the starting URL or crawling depth.

🛠️ Customization

To change the starting URL or behavior:
	•	Modify the start_url and max_depth variables in crawler.py
	•	Enhance with modules like BeautifulSoup or Scrapy if HTML parsing is needed
	•	Integrate logging or export results to a file or database

🧠 Use Cases
	•	Cybersecurity Reconnaissance – Discover site structure during initial scans
	•	SEO Audits – Map out links and page connections
	•	Learning Tool – Great starting point to understand how crawlers work

⚖️ License

This project is licensed under the MIT License – see the LICENSE file for details.

👨‍💻 Author

Daniel Vihorev
Cybersecurity enthusiast & Python developer
GitHub Profile

Ilay Zendani
Cybersecurity enthusiast & Python developer
GitHub Profile

⸻

🧠 “Built from curiosity, crafted for exploration.”

---

Let me know if you want to add badges, Docker support, or split functionality for CLI args!

@All rights saved to Daniel Vihorev and Ilay zendani (Wild Life Cyber Security)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
allFunctionsWorkPerfectly		allFunctionsWorkPerfectly
download_images.py		download_images.py
downlodToPdf.py		downlodToPdf.py
firstfunction_testing.py		firstfunction_testing.py
getallthelinksfunction.py		getallthelinksfunction.py
image.jpg		image.jpg
project.py		project.py
testing.py		testing.py
urlPrse_testing.py		urlPrse_testing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

🕷️ Web Crawler

List of External and Third Party Libraries

🔍 Features

📁 Project Structure

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

DanielVihorev/Crawler

Folders and files

Latest commit

History

Repository files navigation

🕷️ Web Crawler

List of External and Third Party Libraries

🔍 Features

📁 Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages