Intelligent Android malware family classification using Genetic Algorithms and SVM

Yuste Fernández-Alonso, Sara

Publication:
Intelligent Android malware family classification using Genetic Algorithms and SVM

Identifiers

URI: https://hdl.handle.net/10016/29770

Files

TFG_Sara-Yuste_Fernandez_Alonso.pdf (3.05 MB)

Publication date

2019-07

Defense date

2019-10-14

Authors

Yuste Fernández-Alonso, Sara

Advisors

Isasi, Pedro

Sáez Achaerandio, Yago

Impact

Export

Abstract

As of April 2019, Android was the most popular mobile operating system amongst smartphone users[1]. Its high popularity, combined with the extended use of smartphones for everyday tasks as well as storing or accessing sensitive and personal data, has made Android applications the target of numerous malware attacks over the last few years and in the present. The malware attacks have been perfected to target specific vulnerabilities in the operating system or the user; thus specializing in types of malware and families within each type. The malware is usually distributed in infected applications (or APKs), which contain malicious behaviours that can be found looking into their code (known as static analysis) or analysing the behaviour of the application while running (known as dynamic analysis). This document describes the implementation of an intelligent system that aims to classify a series of malicious APK samples obtained from the free repository ContagioDump. These samples are classified inside the type and family they belong to. To create the classifier system, a Support Vector Machine (SVM) is implemented using Python’s library Scikit Learn. A series of attributes are extracted from the samples of malicious APK by analysing the code of the APKs via static analysis, using Python’s library Androguard, which contains a parser that allows to interact with all the relevant parts of the APK file. The attributes obtained are very high in number, and for that reason a Genetic Algorithm is used to optimize the attributes that the SVM uses in the learning process. The algorithm codifies a subset of attributes from all the attributes extracted in the static analysis, and is evaluated using the accuracy score obtained when training the SVM with said subset. As a result, a subset of attributes and a trained model for the classification are obtained. This model is then tested with a new set of malware samples, belonging to all the families classified in the learning. The present document contains the explanation of the process of designing, creating and testing the system. It is developed as bachelor’s thesis for computer science and engineering degree in Universidad Carlos III de Madrid.

Keywords

Genetic algorithms, Neural networks, Support Vector Machine (SVM), Android (Operating system), Malware, Artificial Intelligence

Collections

Trabajos Fin de Grado Escuela Politécnica Superior

Full item page

Publication:
Intelligent Android malware family classification using Genetic Algorithms and SVM

Identifiers

Files

Publication date

Defense date

Authors

Advisors

Tutors

Journal Title

Journal ISSN

Volume Title

Publisher

Impact

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Collections

Publication: Intelligent Android malware family classification using Genetic Algorithms and SVM

Identifiers

Files

Publication date

Defense date

Authors

Advisors

Tutors

Journal Title

Journal ISSN

Volume Title

Publisher

Impact

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Collections

Publication:
Intelligent Android malware family classification using Genetic Algorithms and SVM