This repository is to store the code to fetch the Logos of Legitimate domains, and their webpage screenshots, models, dataset and documentation for the classification of the Phishy Websites using deep learning techniques.
- PhishPedia
.
├── Logo_Fetching
│ ├── logoFetcher.py
├── README.md
├── Logos
│ ├── Logos of Legitimate Domains
├── Outputs
│ ├── Log Files of the Logo Fetching code and the exceptions occurred
├── tranco
│ ├── CSV file containing the top 1 million domains
├── MainAlgorithm
│ ├── main.py (The script which should be executed to get the results)
│ |── getLogo.py (Module to fetch the logo of the domain, will be called from main.py)
│ |── screenshotCapture.py (Module to fetch the full page screenshot of the domain, will be called from main.py)
│ |── UI_Detection.py (Module which will be called from main.py to detect the presence of input boxes in the webpage screenshot)
├── URL_For_Testing
│ ├── scrappingCode.py (Code to fetch the URLs from PhishTank and store the URLs in a CSV file which can be used for testing our algorithm)
│ ├── phishTankDatabase.csv (CSV file containing the URLs fetched from PhishTank)
├── GetAverage_Icon_height_width.py (Code to get the average height and width of the logos of the legitimate domains)
├── topLogo_fetching.py (Code to copy the Logos of top 300 legitimate domains from the Logos folder to a new folder)
(will be used to testing the model on small dataset)