Predicting of anaphylaxis in big data EMR by exploring machine learning approaches

Segura-Bedmar, Isabel; Colón Ruiz, Cristóbal; Tejedor Alonso, Miguel Ángel; Moro Moro, Mar

Publication:
Predicting of anaphylaxis in big data EMR by exploring machine learning approaches

dc.affiliation.dpto	UC3M. Departamento de Informática	es
dc.affiliation.grupoinv	UC3M. Grupo de Investigación: Human Language and Accessibility Technologies (HULAT)	en
dc.contributor.author	Segura-Bedmar, Isabel
dc.contributor.author	Colón Ruiz, Cristóbal
dc.contributor.author	Tejedor Alonso, Miguel Ángel
dc.contributor.author	Moro Moro, Mar
dc.contributor.funder	Ministerio de Economía y Competitividad (España)	es
dc.date.accessioned	2024-01-15T08:51:18Z
dc.date.available	2024-01-15T08:51:18Z
dc.date.issued	2018-11-01
dc.description.abstract	Anaphylaxis is a life-threatening allergic reaction that occurs suddenly after contact with an allergen. Epidemiological studies about anaphylaxis are very important in planning and evaluating new strategies that prevent this reaction, but also in providing a guide to the treatment of patients who have just suffered an anaphylactic reaction. Electronic Medical Records (EMR) are one of the most effective and richest sources for the epidemiology of anaphylaxis, because they provide a low-cost way of accessing rich longitudinal data on large populations. However, a negative aspect is that researchers have to manually review a huge amount of information, which is a very costly and highly time consuming task. Therefore, our goal is to explore different machine learning techniques to process Big Data EMR, lessening the needed efforts for performing epidemiological studies about anaphylaxis. In particular, we aim to study the incidence of anaphylaxis by the automatic classification of EMR. To do this, we employ the most widely used and efficient classifiers in text classification and compare different document representations, which range from well-known methods such as Bag Of Words (BoW) to more recent ones based on word embedding models, such as a simple average of word embeddings or a bag of centroids of word embeddings. Because the identification of anaphylaxis cases in EMR is a class-imbalanced problem (less than 1% describe anaphylaxis cases), we employ a novel undersampling technique based on clustering to balance our dataset. In addition to classical machine learning algorithms, we also use a Convolutional Neural Network (CNN) to classify our dataset.	en
dc.description.sponsorship	This work was supported by the Research Program of the Ministry of Economy and Competitiveness - Government of Spain (DeepEMR project TIN2017-87548-C2-1-R).	en
dc.identifier.bibliographicCitation	Segura Bedmar, I., Colon Ruiz, C., Tejedor Alonso, M.A, Moro Moro, M. (2018). Predicting of anaphylaxis in big data EMR by exploring machine learning approaches, Journal of Biomedical Informatics, 87, pp. 50-59.	en
dc.identifier.doi	https://doi.org/10.1016/j.jbi.2018.09.012
dc.identifier.issn	1532-0464
dc.identifier.publicationfirstpage	50
dc.identifier.publicationlastpage	59
dc.identifier.publicationtitle	Journal of Biomedical Informatics	en
dc.identifier.publicationvolume	87
dc.identifier.uri	https://hdl.handle.net/10016/39217
dc.identifier.uxxi	AR/0000022190
dc.language.iso	eng
dc.publisher	Elsevier
dc.relation.projectID	Gobierno de España. TIN2017-87548-C2-1-R	es
dc.rights	© 2018 Elsevier Inc.	es
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 España
dc.rights.accessRights	open access	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.subject.eciencia	Informática	es
dc.subject.other	deep learning	en
dc.subject.other	text classification	en
dc.subject.other	epidemiological studies	en
dc.subject.other	anaphylasis	e
dc.title	Predicting of anaphylaxis in big data EMR by exploring machine learning approaches	en
dc.type	research article	en
dc.type.hasVersion	AM	en
dspace.entity.type	Publication