MedBiot - Generation of an IoT Botnet Dataset in a Medium-sized IoT Network

About MedBIoT data set

Creators
Alejandro Guerra Manzanares, Jorge Alberto Medina Galindo, Hayretdin Bahsi and Sven Nõmm
Department of Software Science, Center for Digital Forensics and Cyber Security; Tallinn University of Technology; Estonia.
Public release date: 27.02.2020 (Paper publication)

Data set information
The experimental setup preparation was performed by Jorge Alberto Medina Galindo as the core part of his master's thesis.
The main aim of this research was to fulfill the gap in the lack of data set for IoT botnet detection. Its main features are:
  • Combination of real and emulated IoT devices in a medium-sized network (i.e., 83 devices). No other data set before dealt with the combination of such devices before.
  • Actual malware was deployed, providing real malware network data. Three prominent botnet malware were deployed: Mirai, BashLite, and Torii.
  • Labelled data set. The data set is split according to the traffic source (i.e., normal or malware traffic) allowing to easily label the data and extract features from the raw pcap files.
  • The data set is focused on the early stages of botnet deployment: spreading and C&C communication.
This data set is suitable for IoT botnet research in general and intrusion detection systems in particular. For instance, it can be used for supervised learning (i.e, binary and multi-class classification) and unsupervised learning (i.e, anomaly and outlier detection). The demonstration is provided on the published papers.
The emulated and real devices are combined in a medium-sized network. The real devices used for this research were:
  • Sonoff Tasmota smart switch
  • TPLink smart switch
  • TPLink light bulb
The emulated devices were from the following types:
  • Lock
  • Switch
  • Fan
  • Light

Data set structure and files
At the moment, the data set is provided in raw pcap files in two main formats:

  • Bulk: pcap files are provided for each data source type (i.e., legitimate, Mirai, BashLite, and Torii). They can be accessed here.
  • Fine-grained: pcap files are provided for each data source, botnet phase, and device type. For example, mirai_mal_CC_lock.pcap refers to Mirai botnet malware data corresponding to C&C communication for lock devices. They can be accessed here.
Labelling: Each pcap file is labelled as malicious or legitimate/benign traffic. All the network traffic was collected during malware deployment. For that reason, all pcap are labelled with the corresponding malware deployed and the type of traffic, either legitimate or malware. In this regard, the bulk pcap file named as mirai_leg_lock.pcap contains the benign traffic during Mirai malware deployment generated by lock devices. The corresponding malware traffic can be found in the file named as mirai_mal_lock.pcap.

For a more detailed explanation about the data set please refer to the conference paper: here.

Citation Request
Guerra-Manzanares, A.; Medina-Galindo, J.; Bahsi, H. and Nõmm, S. (2020). MedBIoT: Generation of an IoT Botnet Dataset in a Medium-sized IoT Network. In Proceedings of the 6th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP, ISBN 978-989-758-399-5, pages 207-218. DOI: 10.5220/0009187802070218

List of publications
If you publish a paper using this data set, please inform us to add you to the list of publications using this data set, which will be maintained and up-to-date here.

Contact
If you have some doubts, comments or need further information you can contact us on the following email: alejandro.guerra@taltech.ee