This is the official repository for the UMUDGA dataset. Please refer to the WIKI page for usage and detailed information.
- Mattia Zago (mattia.zago [at] um [dot] es)
- Manuel Gil Pérez (mgilperez [at] um [dot] es)
- Gregorio Martínez Pérez (gregorio [at] um [dot] es)
Authors are with the Department of Information and Communications Engineering, University of Murcia, Spain
In computer security, botnets still represent a major cyber threat. Concealing techniques such as the dynamic addressing and the Domain Generation Algorithms (DGAs) require an improved and more effective detection process. To this extent, this data descriptor presents a collection of over 30 million manually-labelled algorithmically generated domain names decorated with a feature set ready-to-use for machine learning (ML) analysis. This proposed dataset enables researchers to move forward the data collection, organization and pre-processing phases, eventually enabling them to focus on the analysis and the production of ML-powered solutions for network intrusion detection. To be as exhaustive as possible, 50 among the most important malware variants have been selected. Each family is available both as list of domains and as collection of features. To be more precise, the former is generated by executing the malware DGAs in a controlled environment with fixed parameters, while the latter is generated by extracting a combination of statistical and Natural Language Processing (NLP) metrics.
T.B.A.
- M. Zago, M. Gil Pérez, and G. Martínez Pérez, "Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in Machine Learning Techniques," Soft Computing, vol. 24, p. 5517–5537, Jan. 2019. DOI: 10.1007/s00500-018-03703-8
- M. Zago, M. Gil Pérez, and G. Martínez Pérez, "UMUDGA: a dataset for profiling DGA-based botnet," Computers & Security, vol. 92, p. 101719, May 2020. DOI: 10.1016/j.cose.2020.101719
- M. Zago, M. Gil Pérez, and G. Martínez Pérez, "UMUDGA: University of Murcia domain generation algorithm dataset," Mendeley Data, Jan. 2020. DOI: 10.17632/y8ph45msv8.1
- M. Zago, M. Gil Pérez, and G. Martínez Pérez, "UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection," Data in Brief, vol. 30, p. 105400, Jun. 2020. DOI: 10.1016/j.dib.2020.105400