xcrawl3r
is a command-line utility designed to recursively spider webpages for URLs. It works by actively traversing websites - following links embedded in webpages, and parsing files (including sitemaps & robots.txt) - to uncover every URL.
Unlike xurlfind3r
that doesn't interact directly with the target, xcrawl3r
interacts directly with the target by spidering its pages in real time. This active approach allows it to discover URLs that may be hidden or unindexed, providing a complete picture of the website’s navigational flow and content distribution. This makes xcrawl3r
a powerful tool for security researchers, IT professionals, and anyone looking to gain insights into the URLs associated with websites.
- Recursively spiders webpages for URLs
- Extracts URLs from files (including sitemaps &
robots.txt
) - Supports
stdin
andstdout
for easy integration in automated workflows - Supports multiple output formats (JSONL, file, stdout)
- Cross-Platform (Windows, Linux & macOS)
Visit the releases page and find the appropriate archive for your operating system and architecture. Download the archive from your browser or copy its URL and retrieve it with wget
or curl
:
-
...with
wget
:wget https://github.com/hueristiq/xcrawl3r/releases/download/v<version>/xcrawl3r-<version>-linux-amd64.tar.gz
-
...or, with
curl
:curl -OL https://github.com/hueristiq/xcrawl3r/releases/download/v<version>/xcrawl3r-<version>-linux-amd64.tar.gz
...then, extract the binary:
tar xf xcrawl3r-<version>-linux-amd64.tar.gz
Tip
The above steps, download and extract, can be combined into a single step with this onliner
curl -sL https://github.com/hueristiq/xcrawl3r/releases/download/v<version>/xcrawl3r-<version>-linux-amd64.tar.gz | tar -xzv
Note
On Windows systems, you should be able to double-click the zip archive to extract the xcrawl3r
executable.
...move the xcrawl3r
binary to somewhere in your PATH
. For example, on GNU/Linux and OS X systems:
sudo mv xcrawl3r /usr/local/bin/
Note
Windows users can follow How to: Add Tool Locations to the PATH Environment Variable in order to add xcrawl3r
to their PATH
.
Before you install from source, you need to make sure that Go is installed on your system. You can install Go by following the official instructions for your operating system. For this, we will assume that Go is already installed.
go install -v github.com/hueristiq/xcrawl3r/cmd/xcrawl3r@latest
-
Clone the repository
git clone https://github.com/hueristiq/xcrawl3r.git
-
Build the utility
cd xcrawl3r/cmd/xcrawl3r && \ go build .
-
Move the
xcrawl3r
binary to somewhere in yourPATH
. For example, on GNU/Linux and OS X systems:sudo mv xcrawl3r /usr/local/bin/
Windows users can follow How to: Add Tool Locations to the PATH Environment Variable in order to add
xcrawl3r
to theirPATH
.
Caution
While the development version is a good way to take a peek at xcrawl3r
's latest features before they get released, be aware that it may have bugs. Officially released versions will generally be more stable.
To install xcrawl3r
on docker:
-
Pull the docker image using:
docker pull hueristiq/xcrawl3r:latest
-
Run
xcrawl3r
using the image:docker run --rm hueristiq/xcrawl3r:latest -h
xcrawl3r
will work right after installation. However, some configuration added to a configuration file at $HOME/.config/xcrawl3r/config.yaml
, created upon first run, or set as environment variables.
Example of environment variables:
XCRAWL3R_REQUEST_TIMEOUT=10
To start using xcrawl3r
, open your terminal and run the following command for a list of options:
xcrawl3r -h
Here's what the help message looks like:
_ _____
__ _____ _ __ __ ___ _| |___ / _ __
\ \/ / __| '__/ _` \ \ /\ / / | |_ \| '__|
> < (__| | | (_| |\ V V /| |___) | |
/_/\_\___|_| \__,_| \_/\_/ |_|____/|_|
v1.1.0
USAGE:
xcrawl3r [OPTIONS]
CONFIGURATION:
-c, --configuration string (default: $HOME/.config/xcrawl3r/config.yaml)
INPUT:
-u, --url string[] target URL
-l, --list string target URLs file path
For multiple URLs, use comma(,) separated value with `--url`,
specify multiple `--url`, load from file with `--list` or load from stdin.
SCOPE:
-d, --domain string[] match domain(s) URLs
For multiple domains, use comma(,) separated value with `--domain`
or specify multiple `--domain`.
--include-subdomains bool with domain(s), match subdomains' URLs
REQUEST:
--delay int delay between each request in seconds
-H, --header string[] header to include in 'header:value' format
For multiple headers, use comma(,) separated value with `--header`
or specify multiple `--header`.
--timeout int time to wait for request in seconds (default: 10)
PROXY:
-p, --proxy string[] Proxy (e.g: http://127.0.0.1:8080)
For multiple proxies use comma(,) separated value with `--proxy`
or specify multiple `--proxy`.
OPTIMIZATION:
--depth int maximum depth to crawl, `0` for infinite (default: 1)
-C, --concurrency int number of concurrent inputs to process (default: 5)
-P, --parallelism int number of concurrent fetchers to use (default: 5)
DEBUG:
--debug bool enable debug mode
OUTPUT:
--jsonl bool output in JSONL(ines)
-o, --output string output write file path
-m, --monochrome bool stdout in monochrome
-s, --silent bool stdout in silent mode
-v, --verbose bool stdout in verbose mode
Contributions are welcome and encouraged! Feel free to submit Pull Requests or report Issues. For more details, check out the contribution guidelines.
A big thank you to all the contributors for your ongoing support!
This package is licensed under the MIT license. You are free to use, modify, and distribute it, as long as you follow the terms of the license. You can find the full license text in the repository - Full MIT license text.