Skip to content

Latest commit

 

History

History

dga-detection

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

DGA Detection

Model Overview

Description:

  • This model is a recurrent neural network model trained to classify URL domains generated by Domain-Generation-Algorithms. Domain generation algorithms (DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets, since infected computers will attempt to contact some of these domain names every day to receive updates or commands.

Requirements:

  • To run this example, additional requirements must be installed into your environment. A supplementary requirements file has been provided in this example directory.

pip install -r requirements.txt

References(s):

Model Architecture:

Architecture Type:

  • Recurrent Neural Network

Network Architecture:

  • GRU

Input: (Enter "None" As Needed)

Input Format:

  • CSV

Input Parameters:

  • Domain names

Other Properties Related to Output:

  • N/A

Output: (Enter "None" As Needed)

Output Format:

  • Binary Results, DGA or Benign

Output Parameters:

  • N/A

Other Properties Related to Output:

  • N/A

Software Integration:

Runtime(s):

  • Morpheus

Supported Hardware Platform(s):

  • Ampere/Turing

Supported Operating System(s):

  • Linux

Model Version(s):

  • v1

Training & Evaluation:

Training Dataset:

Link:

Properties (Quantity, Dataset Descriptions, Sensor(s)):

  • Domain names

Dataset License:

Evaluation Dataset:

Link:

Properties (Quantity, Dataset Descriptions, Sensor(s)):

  • Domain names

Dataset License:

Inference:

Engine:

  • Triton

Test Hardware:

  • Other

Subcards

Model Card ++ Bias Subcard

What is the gender balance of the model validation data?

  • Not Applicable

What is the racial/ethnicity balance of the model validation data?

  • Not Applicable

What is the age balance of the model validation data?

  • Not Applicable

What is the language balance of the model validation data?

  • Domain names could be in any language

What is the geographic origin language balance of the model validation data?

  • Not Applicable

What is the educational background balance of the model validation data?

  • Not Applicable

What is the accent balance of the model validation data?

  • Not Applicable

What is the face/key point balance of the model validation data?

  • Not Applicable

What is the skin/tone balance of the model validation data?

  • Not Applicable

What is the religion balance of the model validation data?

  • Not Applicable

Individuals from the following adversely impacted (protected classes) groups participate in model design and testing.

  • Not Applicable

Describe measures taken to mitigate against unwanted bias.

  • Not Applicable

Model Card ++ Explainability Subcard

Name example applications and use cases for this model.

  • This model is provided for testing purposes. Domain names in DNS queries can be used as input to this model.

Fill in the blank for the model technique.

  • This model is designed for developers seeking to test the DGA functionality with a model trained on a small dataset

Name who is intended to benefit from this model.

  • This model is intended for developers who want to test the functionality of a GRU-based DGA detector.

Describe the model output.

  • This model output can be used as a binary result, DGA or Benign

List the steps explaining how this model works.

  • A GRU model gets trained with the dataset and in inference the model predicts one of the binary classes for each domain. DGA or Benign.

Name the adversely impacted groups (protected classes) this has been tested to deliver comparable outcomes regardless of:

  • Not Applicable

List the technical limitations of the model.

  • Further training is needed for different DGA types.

What performance metrics were used to affirm the model's performance?

  • Accuracy, Precision

What are the potential known risks to users and stakeholders?

  • N/A

What training is recommended for developers working with this model?

  • None

Link the relevant end user license agreement

Model Card ++ Saftey & Security Subcard

Link the location of the training dataset's repository.

Is the model used in an application with physical safety impact?

  • No

Describe physical safety impact (if present).

  • N/A

Was model and dataset assessed for vulnerability for potential form of attack?

  • No

Name applications for the model.

  • This model is provided as an example of DGA detection. It's been trained on a very small dataset. It's mainly for testing purposes.

Name use case restrictions for the model.

  • It's for testing purposes.

Has this been verified to have met prescribed quality standards?

  • No

Name target quality Key Performance Indicators (KPIs) for which this has been tested.

  • N/A

Technical robustness and model security validated?

  • No

Is the model and dataset compliant with National Classification Management Society (NCMS)?

  • No

Are there explicit model and dataset restrictions?

  • No

Are there access restrictions to systems, model, and data?

  • No

Is there a digital signature?

  • No

Model Card ++ Privacy Subcard

Generatable or reverse engineerable personally-identifiable information (PII)?

  • Neither

Was consent obtained for any PII used?

  • N/A

Protected classes used to create this model? (The following were used in model the model's training:)

  • N/A

How often is dataset reviewed?

  • The dataset is initially reviewed upon addition, and subsequent reviews are conducted as needed or upon request for any changes.

Is a mechanism in place to honor data subject right of access or deletion of personal data?

  • N/A

If PII collected for the development of this AI model, was it minimized to only what was required?

  • N/A

Is data in dataset traceable?

  • N/A

Scanned for malware?

  • No

Are we able to identify and trace source of dataset?

  • Yes

Does data labeling (annotation, metadata) comply with privacy laws?

  • N/A

Is data compliant with data subject requests for data correction or removal, if such a request was made?

  • N/A