Publication: Evaluation of machine learning methods in Weka
Loading...
Identifiers
Publication date
2015-09
Defense date
2015-10-15
Authors
Tutors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This document presents a software plug-in to endow the Weka machine learning suite with the complete set
of information-theoretic tools described by Valverde-Albacete and Peláez-Moreno [Pattern Recognition
Letters 31.12 (2010) and PLoS ONE 9.1 (2014)]. The utility of these tools is more evident in multi-class
classification, but they can be used as well for binary tasks.
The Entropy Triangle is an exploratory analysis method that we implemented as an interactive visualization
plugin forWeka. The Entropy Triangle represents in a De Finetti diagram, or ternary plot, a balance equation
of entropies for the estimated distributions of the input and the output of classifiers. This diagram provides,
at a glance, complete information of the confusion matrix in terms of information theory.
Besides the Entropy Triangle, we implement in the package some useful metrics for the assessment of
classifiers based on the perplexity. In the context of classification, the perplexity represents the effective
number of classes for the classification task, which makes it a useful measure of the propagation of
information. Among these metrics, we highlight the Entropy Modified Accuracy, recommended to rank
classifiers, and the Normalized Information Transfer factor, to measure the classifiers level of understanding
of the underlying patterns of the task.
The Waikato Environment for Knowledge Analysis (WEKA) is a workbench for machine learning and data
mining developed at the University of Waikato, New Zealand. Weka has different Graphical User Interfaces
available, that let the user choose from an user friendly interactive explorer, to an automated approach
where multiple experiments can be statistically compared at the same time. An important feature of Weka
is the possibility to use it as a framework for the implementation of algorithms, evaluation metrics and
visualization tools by means of added components.
In this document we describe the design and development of the software package. Before that, we set the
theoretical backdrop reviewing the implemented tools and their mathematical background. To illustrate the
software features and the utility of the tools, we present an example with a multi-class dataset in which
we unbalance the class distribution in different ways. Additionally, we introduce how to use the plug-in
programmatically with a guided example. Finally, we review the project in hindsight and propose future
work.
Description
Keywords
Inteligencia artificial, Aprendizaje, Weka, Clasificación