Skip to content

Domain Generation Algorithm official repository. Please visit the WIKI page for more information

License

Notifications You must be signed in to change notification settings

Cyberdefence-Lab-Murcia/UMUDGA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UMUDGA - University of Murcia Domain Generation Algorithm Dataset

This is the official repository for the UMUDGA dataset. Please refer to the WIKI page for usage and detailed information.

Article Dataset Documentation Code

Authors

  • Mattia Zago (mattia.zago [at] um [dot] es)
  • Manuel Gil Pérez (mgilperez [at] um [dot] es)
  • Gregorio Martínez Pérez (gregorio [at] um [dot] es)

Affiliations

Authors are with the Department of Information and Communications Engineering, University of Murcia, Spain

Abstract

In computer security, botnets still represent a major cyber threat. Concealing techniques such as the dynamic addressing and the Domain Generation Algorithms (DGAs) require an improved and more effective detection process. To this extent, this data descriptor presents a collection of over 30 million manually-labelled algorithmically generated domain names decorated with a feature set ready-to-use for machine learning (ML) analysis. This proposed dataset enables researchers to move forward the data collection, organization and pre-processing phases, eventually enabling them to focus on the analysis and the production of ML-powered solutions for network intrusion detection. To be as exhaustive as possible, 50 among the most important malware variants have been selected. Each family is available both as list of domains and as collection of features. To be more precise, the former is generated by executing the malware DGAs in a controlled environment with fixed parameters, while the latter is generated by extracting a combination of statistical and Natural Language Processing (NLP) metrics.

Usage

T.B.A.

Bibliography

  1. M. Zago, M. Gil Pérez, and G. Martínez Pérez, "Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in Machine Learning Techniques," Soft Computing, vol. 24, p. 5517–5537, Jan. 2019. DOI: 10.1007/s00500-018-03703-8
  2. M. Zago, M. Gil Pérez, and G. Martínez Pérez, "UMUDGA: a dataset for profiling DGA-based botnet," Computers & Security, vol. 92, p. 101719, May 2020. DOI: 10.1016/j.cose.2020.101719
  3. M. Zago, M. Gil Pérez, and G. Martínez Pérez, "UMUDGA: University of Murcia domain generation algorithm dataset," Mendeley Data, Jan. 2020. DOI: 10.17632/y8ph45msv8.1
  4. M. Zago, M. Gil Pérez, and G. Martínez Pérez, "UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection," Data in Brief, vol. 30, p. 105400, Jun. 2020. DOI: 10.1016/j.dib.2020.105400