This is a recursive web crawler written in Rust that visits websites, extracts links, and stores them in a SQLite database.
- Extracts all HTML links on the site
- Stores the discovered URL wtih a unique ID and parent ID
- Persists data in a SQLite database (table:
link) - Recursively crawls websites up to a configurable depth
Each link is saved with its parent URL and depth level, allowing you to visualize the structured hierarchy of your crawled website:
After crawling a website, visualize the link hierarchy using visualize_hierachy.py:
#To test run
python -u ".../link_db_test.py.py" #Fills the database
python -u ".../visualize_hierarchy.py" --db ./data/links_test.db
#Generate all visualization layouts
python -u ".../visualize_hierarchy.py"
#Generate static layout
python -u ".../visualize_hierarchy.py" -s
python -u ".../visualize_hierarchy.py" --static
#Genaerate interactive layout
python -u ".../visualize_hierarchy.py" -i
python -u ".../visualize_hierarchy.py" --interactiveAvailable layouts:
- Tree layout: Hierarchical tree structure
- dynamic layout: interactive layout
Output file (static grpah):
link_hierarchy_tree.png- Hierarchical tree visualization
For more information about NetworkX: https://networkx.org/
Install Python dependencies:
pip install networkx
pip install matplotlib
pip install pyvisgit clone https://github.com/jakobx0/FerrumWeb
cd FerrumWeb
cargo run (if rust is not installed: https://www.rust-lang.org/tools/install )Rust help: https://users.rust-lang.org/t/link-exe-not-found-despite-build-tools-already-installed/47080
On Windows the Error: linker 'link.exe' not found can be solved via:
rustup toolchain install stable-x86_64-pc-windows-gnu
rustup default stable-x86_64-pc-windows-gnuFor Linux the Error: failed to run custom build command for 'openssl-sys v0.9.109':
sudo apt install libssl-devWhen the program starts, it asks for a URL and begins crawling from that page. All discovered links are stored recursively in a database. The resulting structure is useful for analyzing site architectures or detecting broken links.
To analyse the DB file simply open a DBMS of your choice. For Example the DB Browser for SQLite: https://sqlitebrowser.org/
Example SQL Queries:
Count of distinct URLs:
SELECT COUNT(DISTINCT URL) FROM link;Grouped links by frequency:
SELECT URL, COUNT(*) AS count FROM link GROUP BY URL ORDER BY count DESC;

