Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing gotor with httpx and other major changes #307

Merged
merged 30 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
174d769
Add httpx and tabulate to poetry
KingAkeem Oct 8, 2023
1f2a293
Update main.py to handle new changes
KingAkeem Oct 8, 2023
e06fa50
Remove the use of gotor for building trees and retrieving IP address
KingAkeem Oct 8, 2023
71f8cc8
Utilize new api in IO module
KingAkeem Oct 8, 2023
d727061
Remove LinkTree class and use treelib strcuture for hosting nodes
KingAkeem Oct 8, 2023
760b961
Remove gotor submodule
KingAkeem Oct 8, 2023
356c96c
Remove gotor from .gitmodules
KingAkeem Oct 8, 2023
a9b8525
Merge branch 'dev' into python-3.11-dev
KingAkeem Oct 8, 2023
8dd2b81
Update README, scripts and dependency managers to reflect gotor changes
KingAkeem Oct 8, 2023
0ac5c32
Test support for socks5 proxy using default values
KingAkeem Oct 8, 2023
461d67e
flake8 fixes
KingAkeem Oct 9, 2023
1fe7a7b
many changes
KingAkeem Oct 9, 2023
e81243c
Merge branch 'dev' into python-3.11-dev
KingAkeem Oct 9, 2023
87c6c96
Merge branch 'dev' into python-3.11-dev
KingAkeem Oct 9, 2023
182f51a
more major changes
KingAkeem Oct 9, 2023
f702d5b
fix tree printing
KingAkeem Oct 9, 2023
5cfcde3
more major changes
KingAkeem Oct 9, 2023
d9de8d3
flake8 fixes
KingAkeem Oct 9, 2023
6f3f4db
Add option to disable socks5
KingAkeem Oct 9, 2023
e97fca5
Update README
KingAkeem Oct 9, 2023
c380052
Update README
KingAkeem Oct 9, 2023
3a967b9
flake8
KingAkeem Oct 9, 2023
36d3480
better formatted JSON for tree
KingAkeem Oct 9, 2023
ddabe8a
syntax fix and removing threadsafe
KingAkeem Oct 9, 2023
0706fa0
Fix README formatting
KingAkeem Oct 9, 2023
9663610
Add details to argument usage
KingAkeem Oct 9, 2023
ef6e06b
remove unused validators file
KingAkeem Oct 9, 2023
9bda98e
Merge branch 'dev' into python-3.11-dev
KingAkeem Oct 9, 2023
ab33699
Updating README
KingAkeem Oct 9, 2023
c80f844
`Merge branch 'dev' into python-3.11-dev
KingAkeem Oct 12, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions .env
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
export TORBOT_DATA_DIR=${PWD}/data
export HOST='localhost'
export PORT=8081
export LOG_LEVEL="info" # OPTIONS - info, debug, fatal
export SOCKS5_HOST='127.0.0.1'
export SOCKS5_PORT=9050
3 changes: 0 additions & 3 deletions .gitmodules

This file was deleted.

78 changes: 14 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,95 +34,45 @@
6. Crawl custom domains
7. Check if the link is live
8. Built-in Updater
9. Build visual tree of link relationship that can be quickly viewed or saved to an image file
9. Build visual tree of link relationship that can be quickly viewed or saved to an file

...(will be updated)

### Dependencies
- Tor
- Tor (Optional)
- Python ^3.9
- Golang 1.19
- Poetry

### Python Dependencies

(see requirements.txt for more details)

### Golang Dependencies
- https://github.com/KingAkeem/gotor (This service needs to be ran in tandem with TorBot)
(see pyproject.toml or requirements.txt for more details)

## Installation

### Gotor
gotor is needed to run this module.
Note: If the `gotor` directory is empty, you may need to run `git submodule update --init --recursive` to initialize the submodule.

#### Using local Tor service
* Run the tor service:
```sh
sudo service tor start
```
* Make sure that your torrc is configured to SOCKS_PORT localhost:9050

* Open a new terminal and start `gotor`, this can be done using `docker` or `go`
- using go:
```sh
cd gotor && go run cmd/main/main.go -server
```

#### Using tor and gotor docker containers
- using docker (multi-stage image, builds tor and gotor container):
```sh
cd gotor && ./build.sh
```

### TorBot
* TorBot dependencies are managed using `poetry`, you can find the installation commands below:
```sh
poetry install # to install dependencies
poetry run python run.py -u https://www.example.com --depth 2 -v # example of running command with poetry
poetry run python run.py -h # for help
```

### Full Installation
There is a shell script that will attempt to install both `torbot` and `gotor` as global modules.
The script `install.sh` will first install the latest version of `torbot` found in `PyPI`,
then it will attempt to install `gotor` to the `GOBIN` path after making the path globally accessible.
```sh
source install.sh # execute script
```

You can now run
```sh
gotor -server
```
and crawl using
```sh
python -m torbot -u https://www.example.com
poetry run python torbot/main.py -u https://www.example.com --depth 2 --visualize tree --save json # example of running command with poetry
poetry run python torbot/main.py -h # for help
```

### Options
<pre>
usage: Gather and analyze data from Tor sites.

optional arguments:
-h, --help show this help message and exit
--version Show current version of TorBot.
--update Update TorBot to the latest stable version
-q, --quiet
-u URL, --url URL Specifiy a website link to crawl
-s, --save Save results in a file
-m, --mail Get e-mail addresses from the crawled sites
-p, --phone Get phone numbers from the crawled sites
--depth DEPTH Specifiy max depth of crawler (default 1)
--gather Gather data for analysis
-v, --visualize Visualizes tree of data gathered.
-d, --download Downloads tree of data gathered.
-e EXTENSION, --extension EXTENSION
Specifiy additional website extensions to the list(.com , .org, .etc)
-c, --classify Classify the webpage using NLP module
-cAll, --classifyAll Classify all the obtained webpages using NLP module
-i, --info Info displays basic info of the scanned site </pre>
-h, --help Show this help message and exit
-v Displays DEBUG level logging, default is INFO
--version Show current version of TorBot.
--update Update TorBot to the latest stable version
-q, --quiet Prevents display of header and IP address
--save FORMAT Save results in a file. (tree, json)
--visualize FORMAT Visualizes tree of data gathered. (tree, json, table)
-i, --info Info displays basic info of the scanned site
--disable-socks5 Executes HTTP requests without using SOCKS5 proxy</pre>

* NOTE: -u is a mandatory for crawling

Expand Down
1 change: 0 additions & 1 deletion gotor
Submodule gotor deleted from 544df7
203 changes: 145 additions & 58 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading