A solution for collecting abuse reports of Bitcoin addresses. BTCAbuseCrawler (Python) crawls and parses freely available websites and processes the data into a database (PostgreSQL). BTCAbuseSearch (JavaScript, Node.js) provides the collected data via website or API to the user based on their role in the system. Both tools can be run in parallel.
🖨️ Documentation [documentation]
- 📝 Thesis
- 📝 Presentation
🛠️ BTCAbuseCrawler [btc_abuse_crawler]
- ✔️ The PostgreSQL database initializer
- ✔️ Multi-threaded downloading and processing3
- ✔️ Automated run4
- ✔️ Fulfilling the robots.txt rules5
- ✔️ Complete database schema
- ✔️ source - contains names of the sources of addresses and reports
- ✔️ currency - contains all of the available blockchains from Blockchair
- ✔️ source_label - contains labels of the sources (subcategory of the sources)
- ✔️ address - contains BTC and other cryptocurrency addresses
- ✔️ url - contains unique urls gathered during crawling
- ✔️ source_label_url - contains starting urls for the labels of the sources (each label can have multiple starting urls)
- ✔️ data - contains relative links to the crawled data
- ✔️ role - contains user roles with various levels of access to the crawled data
- ✔️ account - contains information about the user account
- ✔️ token - contains API tokens with various levels of access to the crawled data
- ✔️ address_data - contains connection between cryptocurrency addresses and their respective crawled data
- ✔️ session - contains account sessions
- ✔️ Crawling all addresses / reports from the following sources5:
- ✔️ LoyceV
- ✔️ Weekly updates with all BTC addresses (GZIP)
- ✔️ Daily updates (TXT)
- ✔️ BitcoinAbuse
- ✔️ Reported addresses (HTML)6
- ✔️ CheckBitcoinAddress
- ✔️ Reported addresses (HTML)6
- ✔️ CryptoBlacklist
- ✔️ Searched reported BTC addresses (HTML)7
- ✔️ Last reported ETH addresses (HTML)
- ✔️ Bitcoin Generator Scam
- ✔️ Scam BTC addresses (TXT)
- ✔️ Scam non-BTC addresses (TXT)6
- ✔️ BitcoinAIS
- ✔️ Reported addresses (HTML)6
- ✔️ CryptoScamDB
- ✔️ Reported addresses (JSON)6
- ✔️ Cryptscam
- ✔️ SeeKoin
- ✔️ Reported BTC addresses (HTML)7
- ✔️ BitcoinWhosWho
- ✔️ Searched reported BTC addresses (HTML)7
- ✔️ LoyceV
- ✔️ Connecting the crawled addresses and data8
- ✔️ Exception handling
- Download and install PostgreSQL (during the installation set the password: postgres)
- Download and install Python (during the installation check the option: Add python.exe to PATH)
- Go to Settings / Apps / Advanced app settings / App execution aliases and turn off python.exe
- Restart computer
- Go to the program directory
btc_abuse_crawler
- Rename the file
example_db.json
todb.json
- Change password of connection in
db.json
- Rename the file
example_setup.json
tosetup.json
- Change passwords of users in
setup.json
- Open a command prompt
- Change the current working directory to
btc_abuse_crawler
- Install packages using the command
pip install -U -r requirements.txt
🌎 BTCAbuseSearch [btc_abuse_search]
- ✔️ API
- ✔️ Get token
- ✔️ Get currencies
- ✔️ Get sources
- ✔️ Get addresses (filterable by currency & source)
- ✔️ Get address
- ✔️ Get data
- ✔️ Limit access by user roles
- ✔️ Generate token (linked with the account, generated during first sign in)
- ✔️ Caching data
- ✔️ Web pages
- ✔️ Index
- ✔️ Sign up
- ✔️ Sign in
- ✔️ Sign out
- ✔️ Account
- ✔️ Accounts (filterable by email & role) - admin only page
- ✔️ Addresses (filterable by currency & source)
- ✔️ Address - all information related to the searched address
- ✔️ Statistics
- ✔️ API - listed API features
- ✔️ FAQ - answered questions related to the website
- ✔️ Error - 404 Not Found
- Download and install Node.js
- Restart computer
- Go to the program directory
btc_abuse_search
- Rename the file
example_db.json
todb.json
- Change password of connection in
db.json
- Open a command prompt
- Change the current working directory to
btc_abuse_search
- Install packages using the command
npm i -g npm-check-updates && ncu -u && npm i
- Open a command prompt
- Change the current working directory to
btc_abuse_search
- Run the program using the command
node main.js
Footnotes
-
Creates PostgreSQL users, database and its tables.
Fills the tables with the initial data.
Sets some performance parameters of the PostgreSQL server.
Restarts the PostgreSQL service. ↩ -
Deletes PostgreSQL users, database and its tables.
Sets the default parameters of the PostgreSQL server.
Restarts the PostgreSQL service. ↩ -
Uses multiple threads for crawling sources which do not contain new addresses (mainly reports). ↩
-
The program automatically checks the availability of new data.
Once the new data are available, it downloads and stores them in the database and on the disk.
The program never stops unless it is terminated by the user or the operating system. ↩ -
The Crawler is able to find out the cryptocurrency of a given address from all of the blockchains available on Blockchair. ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
The Crawler saves only the data that contains useful information about certain BTC address. ↩ ↩2 ↩3 ↩4
-
The Crawler connects the crawled addresses and data. ↩
-
If you do not open the command line as administrator, you would be prompted by the User Account Control (UAC). ↩
-
Running the program as administrator is required because the program runs another commands (installing packages, restarting PostgreSQL, etc.) which need administrator access. ↩