Citation:
A. Calleja, J. Tapiador and J. Caballero, "The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware Development," in IEEE Transactions on Information Forensics and Security, vol. 14, no. 12, pp. 3175-3190, Dec. 2019, doi: 10.1109/TIFS.2018.2885512
xmlui.dri2xhtml.METS-1.0.item-contributor-funder:
Comunidad de Madrid Ministerio de Economía y Competitividad (España)
Sponsor:
This work was supported in part by the Spanish Government through MINECO grants SMOG-DEV (TIN2016-79095-C2-2-R) and DEDETIS (TIN2015-7013-R), and in part by the Regional Government of Madrid through grantsCIBERDINE (S2013/ICE-3095) and N-GREENS (S2013/ICE-2731).
Project:
Comunidad de Madrid. S2013/ICE-3095 Gobierno de España. TIN2016-79095-C2-2-R Comunidad de Madrid. S2013/ICE-2731 Gobierno de España. TIN2015-7013-R
Keywords:
computer crime
,
computer languages
,
open source software
During the last decades, the problem of malicious and unwanted software (malware) has surged in numbers and sophistication. Malware plays a key role in most of today's cyberattacks and has consolidated as a commodity in the underground economy. In this paper, During the last decades, the problem of malicious and unwanted software (malware) has surged in numbers and sophistication. Malware plays a key role in most of today's cyberattacks and has consolidated as a commodity in the underground economy. In this paper, we analyze the evolution of malware from 1975 to date from a software engineering perspective. We analyze the source code of 456 samples from 428 unique families and obtain measures of their size, code quality, and estimates of the development costs (effort, time, and number of people). Our results suggest an exponential increment of nearly one order of magnitude per decade in aspects such as size and estimated effort, with code quality metrics similar to those of benign software. We also study the extent to which code reuse is present in our dataset. We detect a significant number of code clones across malware families and report which features and functionalities are more commonly shared. Overall, our results support claims about the increasing complexity of malware and its production progressively becoming an industry.[+][-]