López Cuadrado, José LuisGonzález Carrasco, IsraelLópez Hernández, Jesús LeonardoMartínez Fernández, PalomaMartínez Fernández, José Luis2021-11-102021-11-102020-09-18J. L. López-Cuadrado, I. González-Carrasco, J. Leonardo López-Hernández, P. Martínez-Fernández and J. L. Martínez-Fernández, "Automatic Learning Framework for Pharmaceutical Record Matching," in IEEE Access, vol. 8, pp. 171754-171770, 2020, doi: 10.1109/ACCESS.2020.30245582169-3536https://hdl.handle.net/10016/33564Pharmaceutical manufacturers need to analyse a vast number of products in their daily activities. Many times, the same product can be registered several times by different systems using different attributes, and these companies require accurate and quality information regarding their products since these products are drugs. The central hypothesis of this research work is that machine learning can be applied to this domain to efficiently merge different data sources and match the records related to the same product. No human is able to do this in a reasonable way because the number of records to be matched is extremely high. This article presents a framework for pharmaceutical record matching based on machine learning techniques in a big data environment. The proposed framework aims to explode the well-known rules for the matching of records from different databases for training machine learning models. Then the trained models are evaluated by predicting matches with records that do not follow these known rules. Finally, the production environment is simulated by generating a huge amount of combinations of records and predicting the matches. The obtained results show that, despite the good results obtained with the training datasets, in the production environment, the average accuracy of the best model is around 85%. That shows that matches which do not follow the known rules can be predicted and, considering that there is not a human way to process this amount of data, the results are promising.eng© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksAtribución-NoComercial-SinDerivadas 3.0 Españabig datadata integrationmachine learningpattern detectionmedicineAutomatic learning framework for pharmaceutical record matchingresearch articleInformáticahttps://doi.org/10.1109/ACCESS.2020.3024558open access171754171770IEEE Access8AR/0000026086