Intelligent Android malware family classification using Genetic Algorithms and SVM

Yuste Fernández-Alonso, Sara

Publication:
Intelligent Android malware family classification using Genetic Algorithms and SVM

dc.contributor.advisor	Isasi, Pedro
dc.contributor.advisor	Sáez Achaerandio, Yago
dc.contributor.author	Yuste Fernández-Alonso, Sara
dc.contributor.departamento	UC3M. Departamento de Informática	es
dc.date.accessioned	2020-02-24T16:11:18Z
dc.date.available	2020-02-24T16:11:18Z
dc.date.issued	2019-07
dc.date.submitted	2019-10-14
dc.description.abstract	As of April 2019, Android was the most popular mobile operating system amongst smartphone users[1]. Its high popularity, combined with the extended use of smartphones for everyday tasks as well as storing or accessing sensitive and personal data, has made Android applications the target of numerous malware attacks over the last few years and in the present. The malware attacks have been perfected to target specific vulnerabilities in the operating system or the user; thus specializing in types of malware and families within each type. The malware is usually distributed in infected applications (or APKs), which contain malicious behaviours that can be found looking into their code (known as static analysis) or analysing the behaviour of the application while running (known as dynamic analysis). This document describes the implementation of an intelligent system that aims to classify a series of malicious APK samples obtained from the free repository ContagioDump. These samples are classified inside the type and family they belong to. To create the classifier system, a Support Vector Machine (SVM) is implemented using Python’s library Scikit Learn. A series of attributes are extracted from the samples of malicious APK by analysing the code of the APKs via static analysis, using Python’s library Androguard, which contains a parser that allows to interact with all the relevant parts of the APK file. The attributes obtained are very high in number, and for that reason a Genetic Algorithm is used to optimize the attributes that the SVM uses in the learning process. The algorithm codifies a subset of attributes from all the attributes extracted in the static analysis, and is evaluated using the accuracy score obtained when training the SVM with said subset. As a result, a subset of attributes and a trained model for the classification are obtained. This model is then tested with a new set of malware samples, belonging to all the families classified in the learning. The present document contains the explanation of the process of designing, creating and testing the system. It is developed as bachelor’s thesis for computer science and engineering degree in Universidad Carlos III de Madrid.	es
dc.description.degree	Ingeniería en Tecnologías de Telecomunicación (Plan 2010)	es
dc.identifier.uri	https://hdl.handle.net/10016/29770
dc.language.iso	eng	es
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 España	*
dc.rights.accessRights	open access	es
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.subject.eciencia	Informática	es
dc.subject.other	Genetic algorithms	es
dc.subject.other	Neural networks	es
dc.subject.other	Support Vector Machine (SVM)	es
dc.subject.other	Android (Operating system)	es
dc.subject.other	Malware	es
dc.subject.other	Artificial Intelligence	es
dc.title	Intelligent Android malware family classification using Genetic Algorithms and SVM	es
dc.type	bachelor thesis	*
dspace.entity.type	Publication