Skip to content

A solution for collecting abuse reports of Bitcoin addresses. BTCAbuseCrawler (Python) crawls and parses freely available websites and processes the data into a database (PostgreSQL). BTCAbuseSearch (JavaScript, Node.js) provides the collected data via website or API to the user based on their role in the system. Both tools can be run in parallel.

Notifications You must be signed in to change notification settings

adamsarek/btc-address-lookup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 BTC Address Lookup (Master Thesis)

A solution for collecting abuse reports of Bitcoin addresses. BTCAbuseCrawler (Python) crawls and parses freely available websites and processes the data into a database (PostgreSQL). BTCAbuseSearch (JavaScript, Node.js) provides the collected data via website or API to the user based on their role in the system. Both tools can be run in parallel.

🖨️ Documentation [documentation]

Documents

Diagrams

Other

🛠️ BTCAbuseCrawler [btc_abuse_crawler]

Features

  • ✔️ The PostgreSQL database initializer
    • ✔️ Setup1
    • ✔️ Reset2
  • ✔️ Multi-threaded downloading and processing3
  • ✔️ Automated run4
  • ✔️ Fulfilling the robots.txt rules5
  • ✔️ Complete database schema
    • ✔️ source - contains names of the sources of addresses and reports
    • ✔️ currency - contains all of the available blockchains from Blockchair
    • ✔️ source_label - contains labels of the sources (subcategory of the sources)
    • ✔️ address - contains BTC and other cryptocurrency addresses
    • ✔️ url - contains unique urls gathered during crawling
    • ✔️ source_label_url - contains starting urls for the labels of the sources (each label can have multiple starting urls)
    • ✔️ data - contains relative links to the crawled data
    • ✔️ role - contains user roles with various levels of access to the crawled data
    • ✔️ account - contains information about the user account
    • ✔️ token - contains API tokens with various levels of access to the crawled data
    • ✔️ address_data - contains connection between cryptocurrency addresses and their respective crawled data
    • ✔️ session - contains account sessions
  • ✔️ Crawling all addresses / reports from the following sources5:
    • ✔️ LoyceV
      • ✔️ Weekly updates with all BTC addresses (GZIP)
      • ✔️ Daily updates (TXT)
    • ✔️ BitcoinAbuse
      • ✔️ Reported addresses (HTML)6
    • ✔️ CheckBitcoinAddress
      • ✔️ Reported addresses (HTML)6
    • ✔️ CryptoBlacklist
      • ✔️ Searched reported BTC addresses (HTML)7
      • ✔️ Last reported ETH addresses (HTML)
    • ✔️ Bitcoin Generator Scam
      • ✔️ Scam BTC addresses (TXT)
      • ✔️ Scam non-BTC addresses (TXT)6
    • ✔️ BitcoinAIS
      • ✔️ Reported addresses (HTML)6
    • ✔️ CryptoScamDB
      • ✔️ Reported addresses (JSON)6
    • ✔️ Cryptscam
      • ✔️ Searched reported BTC addresses (HTML)7
      • ✔️ Last reported addresses (HTML)6
    • ✔️ SeeKoin
      • ✔️ Reported BTC addresses (HTML)7
    • ✔️ BitcoinWhosWho
      • ✔️ Searched reported BTC addresses (HTML)7
  • ✔️ Connecting the crawled addresses and data8
  • ✔️ Exception handling

Requirements

Installation

  1. Download and install PostgreSQL (during the installation set the password: postgres)
  2. Download and install Python (during the installation check the option: Add python.exe to PATH)
  3. Go to Settings / Apps / Advanced app settings / App execution aliases and turn off python.exe
  4. Restart computer
  5. Go to the program directory btc_abuse_crawler
  6. Rename the file example_db.json to db.json
  7. Change password of connection in db.json
  8. Rename the file example_setup.json to setup.json
  9. Change passwords of users in setup.json
  10. Open a command prompt
  11. Change the current working directory to btc_abuse_crawler
  12. Install packages using the command pip install -U -r requirements.txt

Running

  1. Open a command prompt (as administrator)9
  2. Change the current working directory to btc_abuse_crawler
  3. Run the program using the command python main.py
  4. If User Account Control appears, press Yes10

🌎 BTCAbuseSearch [btc_abuse_search]

Features

  • ✔️ API
    • ✔️ Get token
    • ✔️ Get currencies
    • ✔️ Get sources
    • ✔️ Get addresses (filterable by currency & source)
    • ✔️ Get address
    • ✔️ Get data
    • ✔️ Limit access by user roles
    • ✔️ Generate token (linked with the account, generated during first sign in)
    • ✔️ Caching data
  • ✔️ Web pages
    • ✔️ Index
    • ✔️ Sign up
    • ✔️ Sign in
    • ✔️ Sign out
    • ✔️ Account
    • ✔️ Accounts (filterable by email & role) - admin only page
    • ✔️ Addresses (filterable by currency & source)
    • ✔️ Address - all information related to the searched address
    • ✔️ Statistics
    • ✔️ API - listed API features
    • ✔️ FAQ - answered questions related to the website
    • ✔️ Error - 404 Not Found

Requirements

Installation

  1. Download and install Node.js
  2. Restart computer
  3. Go to the program directory btc_abuse_search
  4. Rename the file example_db.json to db.json
  5. Change password of connection in db.json
  6. Open a command prompt
  7. Change the current working directory to btc_abuse_search
  8. Install packages using the command npm i -g npm-check-updates && ncu -u && npm i

Running

  1. Open a command prompt
  2. Change the current working directory to btc_abuse_search
  3. Run the program using the command node main.js

Footnotes

  1. Creates PostgreSQL users, database and its tables.
    Fills the tables with the initial data.
    Sets some performance parameters of the PostgreSQL server.
    Restarts the PostgreSQL service.

  2. Deletes PostgreSQL users, database and its tables.
    Sets the default parameters of the PostgreSQL server.
    Restarts the PostgreSQL service.

  3. Uses multiple threads for crawling sources which do not contain new addresses (mainly reports).

  4. The program automatically checks the availability of new data.
    Once the new data are available, it downloads and stores them in the database and on the disk.
    The program never stops unless it is terminated by the user or the operating system.

  5. The Crawler respects robots.txt rules of each source. 2

  6. The Crawler is able to find out the cryptocurrency of a given address from all of the blockchains available on Blockchair. 2 3 4 5 6

  7. The Crawler saves only the data that contains useful information about certain BTC address. 2 3 4

  8. The Crawler connects the crawled addresses and data.

  9. If you do not open the command line as administrator, you would be prompted by the User Account Control (UAC).

  10. Running the program as administrator is required because the program runs another commands (installing packages, restarting PostgreSQL, etc.) which need administrator access.

About

A solution for collecting abuse reports of Bitcoin addresses. BTCAbuseCrawler (Python) crawls and parses freely available websites and processes the data into a database (PostgreSQL). BTCAbuseSearch (JavaScript, Node.js) provides the collected data via website or API to the user based on their role in the system. Both tools can be run in parallel.

Topics

Resources

Stars

Watchers

Forks