Classify URLs as directing to phishing campaigns or not.
1.0
This model is a binary classifier to label phishing URLs and non-phishing URLs obtained from host process data.
This model is an LSTM neural network with a fully connected layer to differentiate between legitimate URLs and phishing URLs. Features are derived both from the structure of the URL and the characters in the URL.
To run this example, additional requirements must be installed into your environment. A supplementary requirements file has been provided in this example directory.
pip install -r requirements.txt
Training data consists of 97K URLs labelled as phishing URLs and 100K URLs labelled as legitimate URLs.
150
2000
V100
precision = 0.995 recall = 0.55
To train the model run the following script under working directory.
cd ${MORPHEUS_EXPERIMENTAL_ROOT}/phishing-url-detection/training-tuning
# Run training script and save models
python phishurl-appshield-combined-lstm-dnn.py
This saves trained model files under ../models
directory. Then the inference script can load the models for future inferences.
Combined with host data from DOCA AppShield, this model can be used to detect phishing URLs. A training notebook is also included so that users can update the model as more labeled data is collected. This model is based just on the URL: processing the structure of the URL and words in the URL. Many malicious URLs seem legitimate and are impossible to detect with our features, thus the recall is limited. We can improve the model by adding WHOIS (https://who.is/) and VirusTotal (https://www.virustotal.com/) infromation about the URL.
Snapshots of URL plugins collected from DOCA AppShield
Processes with URLs classified as phishing or non-phishing
N/A
N/A