Publication:
Evaluation of machine learning methods in Weka

Loading...
Thumbnail Image
Identifiers
Publication date
2015-09
Defense date
2015-10-15
Tutors
Journal Title
Journal ISSN
Volume Title
Publisher
Impact
Google Scholar
Export
Research Projects
Organizational Units
Journal Issue
Abstract
This document presents a software plug-in to endow the Weka machine learning suite with the complete set of information-theoretic tools described by Valverde-Albacete and Peláez-Moreno [Pattern Recognition Letters 31.12 (2010) and PLoS ONE 9.1 (2014)]. The utility of these tools is more evident in multi-class classification, but they can be used as well for binary tasks. The Entropy Triangle is an exploratory analysis method that we implemented as an interactive visualization plugin forWeka. The Entropy Triangle represents in a De Finetti diagram, or ternary plot, a balance equation of entropies for the estimated distributions of the input and the output of classifiers. This diagram provides, at a glance, complete information of the confusion matrix in terms of information theory. Besides the Entropy Triangle, we implement in the package some useful metrics for the assessment of classifiers based on the perplexity. In the context of classification, the perplexity represents the effective number of classes for the classification task, which makes it a useful measure of the propagation of information. Among these metrics, we highlight the Entropy Modified Accuracy, recommended to rank classifiers, and the Normalized Information Transfer factor, to measure the classifiers level of understanding of the underlying patterns of the task. The Waikato Environment for Knowledge Analysis (WEKA) is a workbench for machine learning and data mining developed at the University of Waikato, New Zealand. Weka has different Graphical User Interfaces available, that let the user choose from an user friendly interactive explorer, to an automated approach where multiple experiments can be statistically compared at the same time. An important feature of Weka is the possibility to use it as a framework for the implementation of algorithms, evaluation metrics and visualization tools by means of added components. In this document we describe the design and development of the software package. Before that, we set the theoretical backdrop reviewing the implemented tools and their mathematical background. To illustrate the software features and the utility of the tools, we present an example with a multi-class dataset in which we unbalance the class distribution in different ways. Additionally, we introduce how to use the plug-in programmatically with a guided example. Finally, we review the project in hindsight and propose future work.
Description
Keywords
Inteligencia artificial, Aprendizaje, Weka, Clasificación
Bibliographic citation