GitHub - hktalent/dht: BitTorrent DHT Protocol && DHT Spider，faster than shiyanhui/dht

what's the new

✅ update all depend mod to new
✅ update to go 1.18
✅ and config.LocalNodeId
✅ 45396 DHT tracker server ips，now,fly in at high speed to DHT network
✅ Rich annotations
✅ Friendly UML diagram rendering
✅ china,please use VPN over GWF
✅ fix Stuttering problem at startup
✅ fix do one time bug,now to tick 30 Second to do it
✅ fix public ip changed, cleanAll blackIp to do join

Introduction

DHT implements the bittorrent DHT protocol in Go. Now it includes:

It contains two modes, the standard mode and the crawling mode. The standard mode follows the BEPs, and you can use it as a standard dht server. The crawling mode aims to crawl as more metadata info as possiple. It doesn't follow the standard BEPs protocol. With the crawling mode, you can build another BTDigg.

bthub.io is a BT search engine based on the crawling mode.

Installation

go get -u github.com/hktalent/dht@latest

Example

Below is a simple spider. You can move here to see more samples.

$ cat $PWD/config/elasticsearch.yml

cluster.name: my-application
node.name: node-1
path.data: /usr/share/elasticsearch/data
path.logs: /usr/share/elasticsearch/logs
network.host: 0.0.0.0
transport.host: 0.0.0.0
network.publish_host: 192.168.0.107
http.port: 9200
discovery.seed_hosts: [ "192.168.0.112:9300","192.168.0.107:9301","192.168.0.107:9302", "192.168.0.107:9300"]
cluster.initial_master_nodes: [ "192.168.0.112:9300","192.168.0.107:9301","192.168.0.107:9302", "192.168.0.107:9300"]
cluster.routing.allocation.same_shard.host: true
discovery.zen.fd.ping_timeout: 1m
discovery.zen.fd.ping_retries: 5
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-methods : OPTIONS, HEAD, GET, POST, PUT, DELETE
http.cors.allow-headers : Authorization, X-Requested-With,X-Auth-Token,Content-Type, Content-Length
transport.tcp.port: 9300
http.max_content_length: 400mb
indices.query.bool.max_clause_count: 20000
cluster.routing.allocation.disk.threshold_enabled: false

cd sample/spider
go build spider.go
docker run --restart=always --ulimit nofile=65536:65536 -e "discovery.type=single-node" --net esnet -p 9200:9200 -p 9300:9300 -d --name es -v $PWD/logs:/usr/share/elasticsearch/logs -v $PWD/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -v $PWD/config/jvm.options:/usr/share/elasticsearch/config/jvm.options  -v $PWD/data:/usr/share/elasticsearch/data  hktalent/elasticsearch:7.16.2

# your Elasticsearch is http://127.0.0.1:9200/dht_index
./spider -resUrl="http://127.0.0.1:9200/dht_index/_doc/" -address=":0"
open http://127.0.0.1:9200/dht_index/_search?q=GB%20and%20mp4&pretty=true
open http://127.0.0.1:9200/dht_index/_search?q=1080P%20GB%20and%20mp4&pretty=true
open http://127.0.0.1:9200/dht_index/_search?q=pentest%20pdf&pretty=true

import (
    "fmt"
    "github.com/hktalent/dht"
)

func main() {
    downloader := dht.NewWire(65535)
    go func() {
        // once we got the request result
        for resp := range downloader.Response() {
            fmt.Println(resp.InfoHash, resp.MetadataInfo)
        }
    }()
    go downloader.Run()

    config := dht.NewCrawlConfig()
    config.OnAnnouncePeer = func(infoHash, ip string, port int) {
        // request to download the metadata info
        downloader.Request([]byte(infoHash), ip, port)
    }
    d := dht.New(config)

    d.Run()
}

Download

You can download the demo compiled binary file here.

Note

The default crawl mode configure costs about 300M RAM. Set MaxNodes and BlackListMaxSize to fit yourself.
Now it cant't run in LAN because of NAT.

TODO

✅ NAT Traversal.
✅ Implements the full BEP-3.
✅ Optimization.

FAQ

Why it is slow compared to other spiders ?

Well, maybe there are several reasons.

DHT aims to implements the standard BitTorrent DHT protocol, not born for crawling the DHT network.
NAT Traversal issue. You run the crawler in a local network.
It will block ip which looks like bad and a good ip may be mis-judged.

License

MIT, read more here

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
doc		doc
sample		sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bencode.go		bencode.go
bencode_test.go		bencode_test.go
bitmap.go		bitmap.go
bitmap_test.go		bitmap_test.go
bitmap_xor.go		bitmap_xor.go
bitmap_xorfast.go		bitmap_xorfast.go
blacklist.go		blacklist.go
blacklist_test.go		blacklist_test.go
container.go		container.go
container_test.go		container_test.go
dhTrackers.txt		dhTrackers.txt
dht.go		dht.go
dht.puml		dht.puml
dht.svg		dht.svg
dhtTicker.go		dhtTicker.go
dhtTicker_test.go		dhtTicker_test.go
dhtTrackerOld.txt		dhtTrackerOld.txt
go.mod		go.mod
go.sum		go.sum
krpc.go		krpc.go
peerwire.go		peerwire.go
routingtable.go		routingtable.go
stunList.go		stunList.go
stunLists.txt		stunLists.txt
upIps.sh		upIps.sh
util.go		util.go
util_test.go		util_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

what's the new

Introduction

Installation

Example

Download

Note

TODO

FAQ

Why it is slow compared to other spiders ?

License

About

Releases 1

Packages

Languages

License

hktalent/dht

Folders and files

Latest commit

History

Repository files navigation

what's the new

Introduction

Installation

Example

Download

Note

TODO

FAQ

Why it is slow compared to other spiders ?

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages