RT Generic T1 Evaluation of machine learning methods in Weka A1 Pastor Valles, Antonio Ángel AB This document presents a software plug-in to endow the Weka machine learning suite with the complete setof information-theoretic tools described by Valverde-Albacete and Peláez-Moreno [Pattern RecognitionLetters 31.12 (2010) and PLoS ONE 9.1 (2014)]. The utility of these tools is more evident in multi-classclassification, but they can be used as well for binary tasks.The Entropy Triangle is an exploratory analysis method that we implemented as an interactive visualizationplugin forWeka. The Entropy Triangle represents in a De Finetti diagram, or ternary plot, a balance equationof entropies for the estimated distributions of the input and the output of classifiers. This diagram provides,at a glance, complete information of the confusion matrix in terms of information theory.Besides the Entropy Triangle, we implement in the package some useful metrics for the assessment ofclassifiers based on the perplexity. In the context of classification, the perplexity represents the effectivenumber of classes for the classification task, which makes it a useful measure of the propagation ofinformation. Among these metrics, we highlight the Entropy Modified Accuracy, recommended to rankclassifiers, and the Normalized Information Transfer factor, to measure the classifiers level of understandingof the underlying patterns of the task.The Waikato Environment for Knowledge Analysis (WEKA) is a workbench for machine learning and datamining developed at the University of Waikato, New Zealand. Weka has different Graphical User Interfacesavailable, that let the user choose from an user friendly interactive explorer, to an automated approachwhere multiple experiments can be statistically compared at the same time. An important feature of Wekais the possibility to use it as a framework for the implementation of algorithms, evaluation metrics andvisualization tools by means of added components.In this document we describe the design and development of the software package. Before that, we set thetheoretical backdrop reviewing the implemented tools and their mathematical background. To illustrate thesoftware features and the utility of the tools, we present an example with a multi-class dataset in whichwe unbalance the class distribution in different ways. Additionally, we introduce how to use the plug-inprogrammatically with a guided example. Finally, we review the project in hindsight and propose futurework. YR 2015 FD 2015-09 LK https://hdl.handle.net/10016/22339 UL https://hdl.handle.net/10016/22339 LA eng DS e-Archivo RD 5 may. 2024