Publication:
Applications of data analytics and machine learning tools to the enhanced design of modern communication networks and security applications

dc.contributor.advisorHernández Gutiérrez, José Alberto
dc.contributor.authorMartín Martínez, Ignacio
dc.contributor.departamentoUC3M. Departamento de Ingeniería Telemáticaes
dc.date.accessioned2020-09-17T11:23:01Z
dc.date.available2020-09-17T11:23:01Z
dc.date.issued2019-06
dc.date.submitted2019-06-28
dc.descriptionMención Internacional en el título de doctor
dc.description.abstractLately, Artificial Intelligence and Machine Learning (ML) have become game-changing technologies due to their ability to generalize from data and infer algorithmic behaviors that consider larger casuistic that humans are able to. In short, these technologies pursue the installation of human-like intelligence to computer tasks so they can overtake different functions. Despite, their implantation and development in many fields is still too early stage, not to mention the requirements and needs they entail. Therefore, the aim of this thesis is to advance in the application of these technologies and for that we will consider an specific field: The Internet Infrastructure. To this aim, contributions focus on two main specific areas, namely cybersecurity and optical WDM networks. On the security side, we propose a new approach for malware detection and application quality assessment that relies in application meta-information, that is, the data describing the application (such as description, category, permissions...) instead of application code. This approach is detailed and validated in two specific applications: ML-based detection of malware and scalable repackaging detection through meta-data semantic clustering. The first application consists on the usage of meta-data as Machine Learning features with a labeled collection of malware applications to detect whether they are malware or not. Resulting algorithms are capable of detecting malware to a good extent in certain conditions, reaching F-score values of nearly 0.9. Arising from the observations from Machine Learning analysis, Antivirus (AV) engines coming from multi-scanner tools are inspected using data analytics and AI technologies aiming at the understanding of their lack of consensus at the detection and categorization levels. The main aim for this study is twofold: advancing on the understanding of AV detection patterns and policies and the improvement multiengine detection by proposing different aggregation and cleaning tools. Initially, AV engine detections are inspected, showing that most engines disagree when detecting malware to the extent of not completely agreeing in the detection of a single application. Moreover, different detection patterns are observed, namely leader, follower and eccentric engines. At the end, an estimation of the risk of malware per application based on Structural Equation models is proposed. On the family side, we propose a lightweight categorization scheme that achieves comparable scores to other alternatives in the literature at a smaller train cost: SignatureMiner. Using such system, we normalize and categorize AV signatures into 41 distinct families and three broader categories, namely adware, harmful and unknown. Then, an ML classifier to assign and specific category to unknown malware is proposed with high performance. Another application explored for meta-data is that of repackaging detection. Using similarity clustering, a large collection of unlabeled applications from Google Play are inspected and compared to detect potential repackaged applications and their victims. This approach is capable to unveil nearly 420K applications potentially cloned within the Google Play application market. On the network side, we contribute to the introduction of Machine Learning in the field by proposing an integral pipeline framework that improves the development of ML-powered network protocols as enhanced heuristics that emulate optimal solutions in many areas. Such framework is based on data generation, modeling and validation and network implementation. In this thesis, we focus on the first two steps by developing proof of concept solutions for both. Dataset generation and data labeling is addressed with Netgen, a versatile network data generator based on Net2Plan. Netgen functionality is presented and performance and abilities demonstrated. Finally, this thesis addresses the modeling of Routing and Wavelength Assignment (RWA) in its ILP version as an ML problem. The assumption is that ML can be useful to develop an ML-powered heuristic for RWA that performs better than regular heuristics and much faster than ILP and heuristics. Results support the viability of this approach, opening the scheme for other complex network protocols. In sum, this thesis builds different AI-based components to enhance the functionalities and capabilities of different elements in the proposed fields, defining systematic approaches and methodologies to this aim. That way, all works in this document contribute to the design and development of the concept of AI as a Service (AIaaS), that proposes a paradigm for the integration of AI technologies over specific knowledge areas with limited expertise in both AI and the specific area.en
dc.description.degreePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de Madrides
dc.description.responsabilityPresidente: Andrés Marín López.- Secretario: Ignacio de Miguel Jiménez.- Vocal: Marco Ruffinies
dc.identifier.urihttps://hdl.handle.net/10016/30758
dc.language.isoeng
dc.relation.hasparthttps://doi.org/10.1109/CNS.2015.7346893
dc.relation.hasparthttps://doi.org/10.1155/2018/5749481
dc.relation.hasparthttps://doi.org/10.1145/2976749.2989038
dc.relation.hasparthttps://doi.org/10.1109/EISIC.2018.00010
dc.relation.hasparthttps://doi.org/10.1109/CNS.2018.8433141
dc.relation.hasparthttps://doi.org/10.1016/j.future.2019.03.006
dc.relation.hasparthttps://doi.org/10.1016/j.future.2018.12.050
dc.relation.hasparthttps://doi.org/10.1109/ECOC.2018.8535562
dc.relation.hasparthttps://doi.org/10.1109/ICTON.2019.8840020
dc.relation.hasparthttps://doi.org/10.1109/TNSM.2019.2927867
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 España
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject.ecienciaTelecomunicacioneses
dc.subject.otherData analyticsen
dc.subject.otherMachine learning toolsen
dc.subject.otherCommunication networksen
dc.subject.otherSecurityen
dc.titleApplications of data analytics and machine learning tools to the enhanced design of modern communication networks and security applicationsen
dc.typedoctoral thesis*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
tesis_ignacio_martin_martinez_2019.pdf
Size:
5.45 MB
Format:
Adobe Portable Document Format
Collections