RT Journal Article T1 Automatic learning framework for pharmaceutical record matching A1 López Cuadrado, José Luis A1 González Carrasco, Israel A1 López Hernández, Jesús Leonardo A1 Martínez Fernández, Paloma A1 Martínez Fernández, José Luis AB Pharmaceutical manufacturers need to analyse a vast number of products in their daily activities. Many times, the same product can be registered several times by different systems using different attributes, and these companies require accurate and quality information regarding their products since these products are drugs. The central hypothesis of this research work is that machine learning can be applied to this domain to efficiently merge different data sources and match the records related to the same product. No human is able to do this in a reasonable way because the number of records to be matched is extremely high. This article presents a framework for pharmaceutical record matching based on machine learning techniques in a big data environment. The proposed framework aims to explode the well-known rules for the matching of records from different databases for training machine learning models. Then the trained models are evaluated by predicting matches with records that do not follow these known rules. Finally, the production environment is simulated by generating a huge amount of combinations of records and predicting the matches. The obtained results show that, despite the good results obtained with the training datasets, in the production environment, the average accuracy of the best model is around 85%. That shows that matches which do not follow the known rules can be predicted and, considering that there is not a human way to process this amount of data, the results are promising. PB IEEE SN 2169-3536 YR 2020 FD 2020-09-18 LK https://hdl.handle.net/10016/33564 UL https://hdl.handle.net/10016/33564 LA eng NO This work was supported by the Research Program of the Ministry of Economy and competitiveness, Government of Spain, through the DeepEMR Project, under Grant TIN2017-87548-C2-1-R DS e-Archivo RD 31 may. 2024