Skip to content

jakobx0/FerrumWeb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕸️ FERRVMweb

This is a recursive web crawler written in Rust that visits websites, extracts links, and stores them in a SQLite database.

grafik

Features

  • Extracts all HTML links on the site
  • Stores the discovered URL wtih a unique ID and parent ID
  • Persists data in a SQLite database (table: link)
  • Recursively crawls websites up to a configurable depth

📊 Visualization

Each link is saved with its parent URL and depth level, allowing you to visualize the structured hierarchy of your crawled website:

Static visualization with NetworkX library:

FerrumWeb

Interactive visualization with PyVis:

grafik

Usage

After crawling a website, visualize the link hierarchy using visualize_hierachy.py:

#To test run
python -u ".../link_db_test.py.py" #Fills the database
python -u ".../visualize_hierarchy.py" --db ./data/links_test.db 

#Generate all visualization layouts
python -u ".../visualize_hierarchy.py"

#Generate static layout
python -u ".../visualize_hierarchy.py" -s
python -u ".../visualize_hierarchy.py" --static

#Genaerate interactive layout
python -u ".../visualize_hierarchy.py" -i
python -u ".../visualize_hierarchy.py" --interactive

Available layouts:

  • Tree layout: Hierarchical tree structure
  • dynamic layout: interactive layout

Output file (static grpah):

  • link_hierarchy_tree.png - Hierarchical tree visualization

For more information about NetworkX: https://networkx.org/

📦 Crates

💿 Installation

Install Python dependencies:

pip install networkx
pip install matplotlib
pip install pyvis
git clone https://github.com/jakobx0/FerrumWeb
cd FerrumWeb
cargo run (if rust is not installed: https://www.rust-lang.org/tools/install )

Rust help: https://users.rust-lang.org/t/link-exe-not-found-despite-build-tools-already-installed/47080

‼️Troubleshooting:

On Windows the Error: linker 'link.exe' not found can be solved via:

rustup toolchain install stable-x86_64-pc-windows-gnu
rustup default stable-x86_64-pc-windows-gnu

For Linux the Error: failed to run custom build command for 'openssl-sys v0.9.109':

sudo apt install libssl-dev

DB Usage

When the program starts, it asks for a URL and begins crawling from that page. All discovered links are stored recursively in a database. The resulting structure is useful for analyzing site architectures or detecting broken links.

To analyse the DB file simply open a DBMS of your choice. For Example the DB Browser for SQLite: https://sqlitebrowser.org/

Example SQL Queries:

Count of distinct URLs:

SELECT COUNT(DISTINCT URL) FROM link;

Grouped links by frequency:

  SELECT URL, COUNT(*) AS count FROM link GROUP BY URL ORDER BY count DESC;

About

Web crawler: Visits a given webpage and recursively lists all links connected to it.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published