Testing contextualized word embeddings to improve NER in spanish clinical case narratives

Akhtyamova, Liliya; Martínez Fernández, Paloma; Verspoor, Karin; Cardiff, John

Publication:
Testing contextualized word embeddings to improve NER in spanish clinical case narratives

dc.affiliation.dpto	UC3M. Departamento de Informática	es
dc.affiliation.grupoinv	UC3M. Grupo de Investigación: Human Language and Accessibility Technologies (HULAT)	en
dc.contributor.author	Akhtyamova, Liliya
dc.contributor.author	Martínez Fernández, Paloma
dc.contributor.author	Verspoor, Karin
dc.contributor.author	Cardiff, John
dc.contributor.funder	Ministerio de Economía y Competitividad (España)	es
dc.date.accessioned	2023-08-24T11:02:45Z
dc.date.available	2023-08-24T11:02:45Z
dc.date.issued	2020-08-24
dc.description.abstract	In the Big Data era, there is an increasing need to fully exploit and analyze the huge quantity of information available about health. Natural Language Processing (NLP) technologies can contribute by extracting relevant information from unstructured data contained in Electronic Health Records (EHR) such as clinical notes, patients' discharge summaries and radiology reports. The extracted information can help in health-related decision making processes. The Named Entity Recognition (NER) task, which detects important concepts in texts (e.g., diseases, symptoms, drugs, etc.), is crucial in the information extraction process yet has received little attention in languages other than English. In this work, we develop a deep learning-based NLP pipeline for biomedical entity extraction in Spanish clinical narratives. We explore the use of contextualized word embeddings, which incorporate context variation into word representations, to enhance named entity recognition in Spanish language clinical text, particularly of pharmacological substances, compounds, and proteins. Various combinations of word and sense embeddings were tested on the evaluation corpus of the PharmacoNER 2019 task, the Spanish Clinical Case Corpus (SPACCC). This data set consists of clinical case sections extracted from open access Spanish-language medical publications. Our study shows that our deep-learning-based system with domain-specific contextualized embeddings coupled with stacking of complementary embeddings yields superior performance over a system with integrated standard and general-domain word embeddings. With this system, we achieve performance competitive with the state-of-the-art.	en
dc.format.extent	10
dc.identifier.bibliographicCitation	Akhtyamova, L., Martínez, P., Verspoor, K., & Cardiff, J. (2020). Testing contextualized word embeddings to improve NER in Spanish clinical case narratives. IEEE Access, 8, 164717-164726.	en
dc.identifier.doi	https://doi.org/10.1109/ACCESS.2020.30186888
dc.identifier.issn	2169-3536
dc.identifier.publicationfirstpage	164717
dc.identifier.publicationlastpage	164726
dc.identifier.publicationtitle	IEEE Access	en
dc.identifier.publicationvolume	8
dc.identifier.uri	https://hdl.handle.net/10016/38089
dc.identifier.uxxi	AR/0000026476
dc.language.iso	eng	en
dc.publisher	IEEE	en
dc.relation.projectID	Gobierno de España. TIN2017-87548-C2-1-R	es
dc.rights	© 2020 The Authors	en
dc.rights	Atribución 3.0 España	*
dc.rights.accessRights	open access	en
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es/	*
dc.subject.eciencia	Biblioteconomía y Documentación	es
dc.subject.eciencia	Informática	es
dc.subject.other	Clinical case narratives	en
dc.subject.other	Contextualized word embeddings	en
dc.subject.other	Deep learning	en
dc.subject.other	Language representations	en
dc.subject.other	Named entity recognition	en
dc.subject.other	Natural language processing	en
dc.subject.other	Spanish language	en
dc.title	Testing contextualized word embeddings to improve NER in spanish clinical case narratives	en
dc.type	research article	*
dc.type.hasVersion	VoR	*
dspace.entity.type	Publication