Publication:
Testing contextualized word embeddings to improve NER in spanish clinical case narratives

dc.affiliation.dptoUC3M. Departamento de Informáticaes
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Human Language and Accessibility Technologies (HULAT)en
dc.contributor.authorAkhtyamova, Liliya
dc.contributor.authorMartínez Fernández, Paloma
dc.contributor.authorVerspoor, Karin
dc.contributor.authorCardiff, John
dc.contributor.funderMinisterio de Economía y Competitividad (España)es
dc.date.accessioned2023-08-24T11:02:45Z
dc.date.available2023-08-24T11:02:45Z
dc.date.issued2020-08-24
dc.description.abstractIn the Big Data era, there is an increasing need to fully exploit and analyze the huge quantity of information available about health. Natural Language Processing (NLP) technologies can contribute by extracting relevant information from unstructured data contained in Electronic Health Records (EHR) such as clinical notes, patients' discharge summaries and radiology reports. The extracted information can help in health-related decision making processes. The Named Entity Recognition (NER) task, which detects important concepts in texts (e.g., diseases, symptoms, drugs, etc.), is crucial in the information extraction process yet has received little attention in languages other than English. In this work, we develop a deep learning-based NLP pipeline for biomedical entity extraction in Spanish clinical narratives. We explore the use of contextualized word embeddings, which incorporate context variation into word representations, to enhance named entity recognition in Spanish language clinical text, particularly of pharmacological substances, compounds, and proteins. Various combinations of word and sense embeddings were tested on the evaluation corpus of the PharmacoNER 2019 task, the Spanish Clinical Case Corpus (SPACCC). This data set consists of clinical case sections extracted from open access Spanish-language medical publications. Our study shows that our deep-learning-based system with domain-specific contextualized embeddings coupled with stacking of complementary embeddings yields superior performance over a system with integrated standard and general-domain word embeddings. With this system, we achieve performance competitive with the state-of-the-art.en
dc.format.extent10
dc.identifier.bibliographicCitationAkhtyamova, L., Martínez, P., Verspoor, K., & Cardiff, J. (2020). Testing contextualized word embeddings to improve NER in Spanish clinical case narratives. IEEE Access, 8, 164717-164726.en
dc.identifier.doihttps://doi.org/10.1109/ACCESS.2020.30186888
dc.identifier.issn2169-3536
dc.identifier.publicationfirstpage164717
dc.identifier.publicationlastpage164726
dc.identifier.publicationtitleIEEE Accessen
dc.identifier.publicationvolume8
dc.identifier.urihttps://hdl.handle.net/10016/38089
dc.identifier.uxxiAR/0000026476
dc.language.isoengen
dc.publisherIEEEen
dc.relation.projectIDGobierno de España. TIN2017-87548-C2-1-Res
dc.rights© 2020 The Authorsen
dc.rightsAtribución 3.0 España*
dc.rights.accessRightsopen accessen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subject.ecienciaBiblioteconomía y Documentaciónes
dc.subject.ecienciaInformáticaes
dc.subject.otherClinical case narrativesen
dc.subject.otherContextualized word embeddingsen
dc.subject.otherDeep learningen
dc.subject.otherLanguage representationsen
dc.subject.otherNamed entity recognitionen
dc.subject.otherNatural language processingen
dc.subject.otherSpanish languageen
dc.titleTesting contextualized word embeddings to improve NER in spanish clinical case narrativesen
dc.typeresearch article*
dc.type.hasVersionVoR*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Testing_IEEEA_2020.pdf
Size:
1.03 MB
Format:
Adobe Portable Document Format