RT Dissertation/Thesis T1 Biopotential signals and their applicability to cibersecurity problems A1 Fuster Barceló, Caterina AB Biometric systems are an uprising technique of identification in today’sworld. Many different biometric systems have been used in everyone’sdaily life in the past years, such as fingerprint, face scan, ECG, and others.More than 20 years evince that the Elektrokardiogramm (EKG) or Electrocardiogram(ECG) is a feasible method to perform user identification as eachperson has their unique and inherent Elektrokardiogramm (EKG). A biometricsystem is based on something that every human being is and cannot loseor possess as it is an eye, the DNA, palm print, vein patterns, iris, retina,etc. For this reason, during the last decade, biometric identification or authenticationhas gained ground between the classic authentication systems asit was a PIN or a physical key. All biometric systems, to be accepted, mustfulfill a set of requirements including universality, uniqueness, permanence,and collectability. The EKG is a biometric trait that not only fulfills thoserequirements but also has some advantages over other biometric traits. Touse an EKG as the biometric trait for identification is motivated by four keypoints: 1) the collection of an EKG is a non-invasive technique so may contributeto the acceptability among the population; 2) a human being can onlybe identified if they are alive as their heart must be beating; 3) all livingbeings have their EKG so it is inclusive; 4) an EKG not only provides identificationbut also provides a medical and even emotional diagnose.There exist many works regarding user identification with EKGs in thecurrent state-of-the-art. Biometric identification with EKGs has been deployedusing many different techniques. Some works use the fiducial pointsof the EKG signal (T-peak, R-peak, P-onset, QRS-offset, ...) to perform theuser identification and others use feature extraction performed by a NeuralNetwork as the classification or identification method. As the EKG is a signalwhich is expressed in time and frequency, many different Neural Networkmodels can exploit the dissimilarity between each EKG signal from each userto perform user identification such as Recurrent Neural Networks, ConvolutionalNeural Networks, Long-Short Term Memory, Principal ComponentAnalysis, among others offering very competitive results.Focusing on user identification, depending on the user condition in eachcase, as has been commented before, the EKG not only contributes as anidentification method but also offers a diagnosis as it is a person’s conditionfrom a medical point of view or a person’s status regarding their emotionalstate. Some research has studied certain conditions such as anxiety over EKGidentification showing that higher heart rates might be more complex to identify individuals.Nevertheless, there are some drawbacks in the current state-of-the-art regardingidentification with EKG. Many systems use very complexly DeepLearning architectures or, as commented, extract the features by a fiducialanalysis making the biometric system too complex and computationally costly.One important flaw, not only in biometric systems but in science, is the lackof publicly available datasets and the use of private ones to perform differentstudies. Using a private database for any research makes the experiments andresults irreproducible and it could be considered a drawback in any sciencefield. Furthermore, many of these works use the EKG signal in a sense thatit can be recovered from the identification system so there is no privacy protectionfor the user as anyone could retrieve their EKG signal.Owing to the many drawbacks of a biometric system based on ECG signals,ELEKTRA is presented in this thesis as a new identification system whoseaim is to overcome all the inconveniences of the current proposals. ELEKTRAis a biometric system that performs user identification by using EKGsconverted into a heatmap of a set of aligned R-peaks (heartbeats), forming amatrix called an Elektrokardiomatrix (EKM).ELEKTRA is based on past work where the EKM was already createdfor medical purposes. As far as the literature covers up to this date, all theexisting research regarding the use of the EKM is focused on the diagnosisof different Cardiovascular Disease (CVD) such as Congestive Heart Failure,Atrial Fibrillation, and Heart Rate Variability, among others. Therefore, thework presented in this thesis, presumably, is the first one to use the EKM asa valid identification method.In aim to offer reproducible results, four different public databases aretaken to show the model feasibility and adaptability: i) the Normal SinusRhythm Database (NSRDB), ii) the MIT-BIH Arrhythmia Database (MITBIHDB),iii) the Physikalisch-Technische Bundesanstalt (PTBDB), and iv)the Glasgow University Database (GUDB). The first three of them (i, ii andiii) are taken from Physionet a freely-available repository with medical researchdata, managed by the MIT Laboratory. However, the fourth database(iv ) is also freely available by petition to Glasgow University.Furthermore, to test ELEKTRA’s adaptability and feasibility of the biometricsystem presented, four different datasets are built from the databaseswhere the EKG signals are segmented into windows to create several Elektrokardiomatrix(EKM)s. The number of EKMs built for each dataset willdepend on the length of the records. For example, for the Normal SinusRhythm Database (NSRDB) as the EKG records are very extensive, 3000EKMs or images per user will be obtained. However, for the three other databases, the highest possible number of EKM images is obtained until thesignal is lost. It is important to take into account that depending on the numberof heartbeats taken to be represented in each EKM, a different numberof EKMs is obtained for the three databases in which EKG recordings areshorter. As higher the number of heartbeats o R-peaks taken (i.e., 7bpf), thefewer images will be obtained.Once the datasets of EKMs are constructed, a simple yet effective ConvolutionalNeural Network (CNN) is built by one 2D Convolution with ReLUactivation, a max-pooling operation followed by a dropout to include regularisationand, and finally, a layer with flattened and dense operations witha softmax or sigmoid function depending if the classification task is categoricalo binary to achieve the final classification. With this simple CNN, thefeasibility and adaptability of ELEKTRA are demonstrated during all theexperiments.The four databases are tested during chapters 3, 4, and 5 where the experimentationtakes place. In Chapter 3, the NSRDB is studied as the baselineof identification with control users. Different experiments are conducted withaim of studying ELEKTRA’s behavior. In the first experiments, how manyheartbeats are needed to identify a user and the costs of convergence of themodel depending on the time computing and the number of heartbeats takento be represented in the EKM are studied. In this case, similar results areachieved in all the experiments as results close to 100% of accuracy are obtained.In the classification of a non-seen user a user, from a different databasethat has not been seen in any other experiment, is processed and tested againstthe network. The result obtained is that a non-seen user or an impersonatorwould only bypass the system one in ten times which can be considered alow ratio when many systems are blocked after three to five attempts. Theclassification of a user is tested to have a closer situation in which a low-costsensor is used. For this experiment, an EKG signal is modified by addingGaussian noise and then processed as any other signal. As a demonstrationof our robust system, an accuracy of 99% is obtained indicating that a noisysignal can be processed too. The last experiment over the NSRDB is wherethis database is used to test the feasibility of ELEKTRA by testing how manyimages or EKM are enough to identify a user. Even though there is a decreasein accuracy when the number of images used to train the network is decreasedtoo, a 97% of accuracy is obtained when training the network with only 300EKMs per user. This chapter concludes that, as shown in all the experiments,ELEKTRA is a valid and feasible identification method for control users.The MIT-BIH Arrhythmia Database (MIT-BIHDB) is a database comprisingpatients with Arrhythmia and random users, and the Physikalisch-Technische Bundesanstalt (PTBDB) comprises patients with different CVDtogether with healthy users. Hence, the main goal in Chapter 4 is to study the identification system proposed over users with CVD showing ELEKTRA’sadaptability. First of all, the MIT-BIHDB is tested achieving outperformingresults and showing how ELEKTRokardiomatrix Application to biometricidentification with Convolutional Neural Networks (ELEKTRA) is capableto identify a pool of users with and without arrhythmia with just a slightdecrease of the network’s accuracy as a 97% of accuracy is obtained. Secondly,the whole PTBDB is taken to test the biometric system. The resultobtained in this experiment is lower than in the other ones (a 93% of accuracy)as the number of images used to train the network has suffered a greatdecrease compared to the other experiments and 232 users are being studied.Lastly, ELEKTRA has tested over 162 users from the PTBDB with specificCVD which, namely, are Bundle branch block, Cardiomyopathy, Dysrhythmia,Myocardial infarction, Myocarditis, and Valvular heart disease. Throughthis experiment, the aim is to see ELEKTRA’s behaviour when only userswith CVD are included. Better results are obtained compared to the lastexperiment. It can be owed that the number of users has decreased and thata CVD makes more unique each EKG as many researchers use the EKM fordiagnosis purposes. The conclusion extracted from all the experiments fromthis chapter is that ELEKTRA is capable to identify users with and withoutCVD approaching a real-life scenario.In Chapter 5 the Glasgow University Database (GUDB) is tested to evaluatethe performance of user identification when the users are performingdifferent activities. The GUDB comprises 25 users performing five differentactivities with different levels of cardiovascular effort: sitting, walking on atreadmill, doing a maths exam, using a handbike, and running on a treadmill.The proposed biometric system is tested with each of these activities for 3and 5 bpf achieving different results in each case. For the experiments performedwhere an activity requiring lower cardiovascular effort such as sittingor walking, the accuracy obtained is close to 100% as it is 99.19% for sittingand 98.59% for walking. Then for the scenarios where higher heartbeat ratesare supposed the experiment results in lower accuracies as it is jogging withan 82.63% and biking with a 95.51%. For the maths scenario, its outcomeis different; the heartbeat rate for each user could be different depending onhow nervous each user is. Hence, a 94.0% is obtained with this activity. Theconclusion extracted from these first experiments is that it is more complexto identify users when they are performing an activity that requires a highercardiovascular effort and, consequently have a higher heart rate. For the followingexperiment, all scenarios have been merged to study the behaviour ofa system that has been trained with users performing different activities. Inthis case, the results obtained seemed to be close to the mean of the results obtainedbefore as the general accuracy for all the scenarios with 3bpf is 91.32%.For the subsequent experiments, some of the scenarios have been merged intotwo different categories. On the one hand, the more calmed activities (sittingand walking) have been merged in the so-called Low Cardiovascular Activity (LCA) scenario. The accuracy obtained by training and testing with thesetwo activities together is 97.74% and an EER of 1.01%. On the other hand,the High Cardiovascular Activity (HCA) scenario is composed by activitiesthat require a higher cardiovascular effort (jogging and biking). In this case,the results obtained have decreased compared to the last ones as the accuracyis 85.71%. It can be noticed that what has suffered a considerable increase isthe False Rejection Rate (FRR) which is 14.17% without implying an increasein the False Acceptance Rate (FAR) which is still very low as it is 0.6%. Thelast experiments have been called fight of scenarios as there is a confrontationbetween scenarios by merging some of them and training with some activitiesor scenarios and predicting with different ones. The first experiments that canbe found in this section are training with the LCA group and testing withthe HCA group and vice versa. The results here show a great decrease in theperformance as accuracies are 37.24% and 46.42%, respectively. This fact impliesthat it is more complex to identify users that have been registered witha different heartbeat rate. Last but not least, there are a set of experimentswhere the activities have been confronted such as training the network withthe sitting scenario and testing with the jogging scenario. These experimentsconfirm the hypothesis for higher heart rates, are more complex to identifyusers, and even more when the network has been trained over calmed users.Even though, one of the main advantages of the presented model is that, evenfor low accuracies, the False Acceptance Rate has not increased compared tothe other experiments meaning that an impostor could not achieve bypassingthe system.Lastly, in Chapter 6 conclusions and discussions are offered. A comparisonbetween ELEKTRA and other biometric systems based on EKGs fromthe current state-of-the-art is offered. These researches from the literatureare examined to show how ELEKTRA outperforms all of them in regards tosome of the aspects such as efficiency, complexity, accuracy, error rates, andreproducibility among others. It is important to remark that, compared to theother works, in all experiments performed in this doctoral thesis, really highperformances with high accuracies and low error rates are achieved. In fact,what is remarkable is that this performance is obtained using a very simpleCNN conformed by just one convolutional layer. By achieving outstandingresults with a simple neural network, the solidity of ELEKTRA is proven.By this, ELEKTRA contributes to the state-of-the-art by providing a newmethod for user identification with EKGs with many benefits. Outstandingresults in terms of high accuracy and low error rates in the experiments assurethe efficiency of ELEKTRA. The fact that the databases used to performthe experimentation in this doctoral thesis are publicly available, makes thiswork reproducible in contrast to many other works in the literature. In fact,as the databases used are different depending on the users’ nature conformingto each database, it is established that the identification method proposed is inclusive as all living beings have their own EKG and high accuracies are alsoobtained when testing the model over users with different CVD. Moreover, asit has been proven that users with CVD can also be identified without havingmajor drawbacks, ELEKTRA offers an identification system that can alsooffer a diagnosis of the user who is being identified in terms of their medicalhealth. In addition, thanks to the GUDB, ELEKTRA can determine for thefirst time, as far as the literature reaches, that performing user identificationwith EKGs over users performing activities requiring a higher cardiovasculareffort and consequently having higher heartbeat rates, is more complex.In conclusion, by the studies and experiments performed in this doctoralthesis, it can be assumed that ELEKTRA is a feasible and efficient identificationmethod for biometrics with EKG and outperforms the current stateof-the-art proposals in user identification with EKG. AB Los sistemas biométricos son una técnica de identificación en auge en laactualidad. En los últimos años se han utilizado muchos sistemas diferentesen la vida cotidiana, como la huella dactilar, el escáner facial, o el ECG,entre otros. De hecho, son más de 20 años los que avalan que el Elektrokardiogramm(EKG) o el Electrocardiogram (ECG) es un método fiable pararealizar identificación de usuarios. En esta tesis se propone un nuevo métodode identificación biométrica denominado ELEKTRA. Por otro lado, existenalgunos inconvenientes en el estado del arte actual respecto a la identificacióncon EKG. Muchos sistemas utilizan arquitecturas muy complejas de DeepLearning o extraen las características importantes mediante un análisis fiduciario,haciendo que el sistema biométrico sea demasiado complejo o costoso.Un fallo importante, no solo en los sistemas biométricos, es la falta de basesde datos públicas y el uso de bases de datos privadas para la investigación. Eluso de bases de datos privadas en cualquier estudio hace que los experimentosy los resultados sean irreproducibles y son un inconveniente en cualquiercampo de la ciencia.En esta tesis doctoral se ha desarrollado ELEKTRA, un sistema de identificaciónbiométrica, mediante el uso de imagénes llamadas Elektrokardiomatrix(EKM). Estas imágenes se construyen a partir de realizar un mapa decalor de un conjunto de picos R (latidos) alineados, formando una matriz.Con el fin de ofrecer resultados reproducibles, se usan cuatro diferentes basesde datos públicas para demostrar la viabilidad y adaptabilidad del modelo:la Normal Sinus Rhythm Database (NSRDB), la MIT-BIH ArrhythmiaDatabase (MIT-BIHDB), la Physikalisch-Technische Bundesanstalt (PTBDB)y la Glasgow University Database (GUDB). Se han creado nuevas sub-basesde datos de EKMs a partir de cada una de las bases de datos mencionadas.Además, para testear la adaptabilidad y viabilidad de ELEKTRA como sistemabiométrico se construye una CNN sencilla, pero eficaz, con una sola capaConvolucional.Las cuatro bases de datos anteriormente mencionadas se han testeado enlos Capítulos 3, 4 y 5. En el Capítulo 3 se estudia la NSRDB como pruebade concepto de identificación en usuarios control. Se realizan diferentes experimentoscon el objetivo de estudiar el comportamiento de ELEKTRA. Lascaracterísticas estudiadas con esta base de datos son: cuántos latidos sonnecesarios para identificar a un usuario; los costes de convergencia del modelopresentado; la clasificación de un usuario jamás visto proveniente de una basede datos diferente; la clasificación de un usuario cuya señal EKG ha sido modificada añadiendo ruido Gaussiano; y la viabilidad de ELEKTRA probandocuántas imágenes o EKM son suficientes para identificar a un usuario.En cuanto a las bases de datos que contienen usuarios con CVD, la MITBIHDBcontiene pacientes con Arritmia y usuarios sanos, y la PTBDB contienepacientes con diferentes CVD junto a usuarios sanos. Estas dos basesde datos se estudian en el Capítulo 4, donde se estudia la adaptabilidad deELEKTRA a distintas CVDs. En primer lugar, se testea la MIT-BIHDB lograndoresultados prometedores y mostrando cómo ELEKTRA es capaz deidentificar usuarios con y sin arritmia en el mismo grupo. En segundo lugar,se toma la PTBDB completa obteniendo porcentajes altos de acierto y bajosen cuanto a tasas de error concierne. Y por último, se prueba ELEKTRAsobre algunos usuarios con CVD específicos de la PTBDB para ver su comportamientocuando sólo se incluyen usuarios con CVD. El resultado de estosexperimentos muestra cómo ELEKTRA es capaz de identificar a los usuarioscon y sin CVD acercándose a un escenario real.Por último, en el capítulo 5 se prueba ELEKTRA sobre la GUDB paraevaluar el rendimiento de la identificación de usuarios cuando éstos realizandiferentes actividades cardiovasculares. La GUDB consta de 25 usuarios querealizan cinco actividades diferentes con distintos niveles de esfuerzo cardiovascular(sentarse, caminar, hacer un examen de matemáticas, usar una bicicletade mano y correr en una cinta). El sistema biométrico propuestose prueba con cada una de estas actividades para mostrar que es más complejoidentificar a los usuarios cuando realizan una actividad que requiere unmayor esfuerzo cardiovascular y, en consecuencia, tienen una mayor frecuenciacardíaca. Los experimentos realizados consisten en fusionar diferentes actividadespara estudiar las diferencias entre las frecuencias cardíacas y cómo laidentificación del usuario está relacionada la misma. El experimento más representativose realiza entrenando el modelo con el escenario en el que el usuarioestá sentado y realizando la clasificación ciega de usuarios del escenario en elcual están corriendo. En este experimento, se obtiene una precisión realmentebaja demostrando que para frecuencias de latidos más altas es más complejoidentificar a un usuario. De hecho, una de las principales ventajas del modelopresentado es que, incluso con una precisión baja, la Tasa de Falsa Aceptaciónno ha aumentado en comparación con los otros experimentos, lo que significaque un impostor no podría conseguir eludir el sistema. Sin embargo, si labase de datos se lanza sobre todas las actividades fusionadas, se muestranresultados precisos que ofrecen un modelo inclusivo para entrenar y probarsobre usuarios que realizan diferentes actividades.De este modo, ELEKTRA contribuye al estado del arte proporcionandoun nuevo método de identificación de usuarios con EKGs con muchas ventajas.Los excelentes resultados en términos de alta precisión y bajas tasasde error en los experimentos, aseguran la eficiencia de ELEKTRA. El hecho de que las bases de datos utilizadas para realizar la experimentación en estatesis doctoral estén disponibles públicamente, hace que este trabajo sea reproducible.De hecho, como las bases de datos utilizadas son diferentes enfunción de los usuarios que conforman cada una, se establece que el métodode identificación propuesto es inclusivo ya que todos los seres vivos tienen supropio EKG. También, se obtienen altas precisiones al probar el modelo sobreusuarios con diferentes CVD. Además, gracias a la GUDB, ELEKTRA determinaque identificar usuarios en base a sus EKGs mientras hacen actividadescardiovasculares, que requieren un mayor esfuerzo, es más complejo.En conclusión, por los estudios realizados en esta tesis doctoral, se puedeasumir que ELEKTRA es un método de identificación factible y eficiente parala biometría con EKG. YR 2022 FD 2022-12 LK https://hdl.handle.net/10016/36463 UL https://hdl.handle.net/10016/36463 LA eng DS e-Archivo RD 16 jun. 2024