Rights:
Atribución-NoComercial-SinDerivadas 3.0 España
Abstract:
The main object of this PhD. Thesis is the identification, characterization and
study of new loss functions to address the so-called cost-sensitive classification. Many
decision problems are intrinsically cost-sensitive. However, the dominating preference
fThe main object of this PhD. Thesis is the identification, characterization and
study of new loss functions to address the so-called cost-sensitive classification. Many
decision problems are intrinsically cost-sensitive. However, the dominating preference
for cost-insensitive methods in the machine learning literature is a natural consequence
of the fact that true costs in real applications are di fficult to evaluate.
Since, in general, uncovering the correct class of the data is less costly than any
decision error, designing low error decision systems is a reasonable (but suboptimal)
approach. For instance, consider the classification of credit applicants as either being good customers (will pay back the credit) or bad customers (will fail to pay o part of the credit). The cost of classifying one risky borrower as good could be much higher than the cost of classifying a potentially good customer as bad.
Our proposal relies on Bayes decision theory where the goal is to assign instances
to the class with minimum expected cost. The decision is made involving both costs and posterior probabilities of the classes. Obtaining calibrated probability
estimates at the classifier output requires a suitable learning machine, a large enough
representative data set as well as an adequate loss function to be minimized during
learning. The design of the loss function can be aided by the costs: classical decision
theory shows that cost matrices de ne class boundaries determined by posterior class
probability estimates. Strictly speaking, in order to make optimal decisions, accurate
probability estimates are only required near the decision boundaries. It is key to
point out that the election of the loss function becomes especially relevant when
the prior knowledge about the problem is limited or the available training examples
are somehow unsuitable. In those cases, different loss functions lead to dramatically
different posterior probabilities estimates. We focus our study on the set of Bregman
divergences. These divergences offer a rich family of proper losses that has recently
become very popular in the machine learning community [Nock and Nielsen, 2009,
Reid and Williamson, 2009a].
The first part of the Thesis deals with the development of a novel parametric family of multiclass Bregman divergences which captures the information in the cost
matrix, so that the loss function is adapted to each specific problem. Multiclass costsensitive learning is one of the main challenges in cost-sensitive learning and, through this parametric family, we provide a natural framework to successfully overcome
binary tasks. Following this idea, two lines are explored:
Cost-sensitive supervised classification: We derive several asymptotic results.
The first analysis guarantees that the proposed Bregman divergence has maximum sensitivity to changes at probability vectors near the decision regions. Further analysis shows that the optimization of this Bregman divergence becomes equivalent to minimizing the overall cost regret in non-separable problems, and to maximizing a margin in separable problems.
Cost-sensitive semi-supervised classification: When labeled data is
scarce but unlabeled data is widely available, semi-supervised learning is an
useful tool to make the most of the unlabeled data. We discuss an optimization
problem relying on the minimization of our parametric family of Bregman divergences, using both labeled and unlabeled data, based on what is called the Entropy Minimization principle. We propose the rst multiclass cost-sensitive semi-supervised algorithm, under the assumption that inter-class separation is stronger than intra-class separation.
The second part of the Thesis deals with the transformation of this parametric family of Bregman divergences into a sequence of Bregman divergences. Work along this line can be further divided into two additional areas:
Foundations of sequences of Bregman divergences: We generalize some
previous results about the design and characterization of Bregman divergences
that are suitable for learning and their relationship with convexity. In addition,
we aim to broaden the subset of Bregman divergences that are interesting for
cost-sensitive learning. Under very general conditions, we nd sequences of (cost-sensitive) Bregman divergences, whose minimization provides minimum (cost-sensitive) risk for non-separable problems and some type of maximum margin classifiers in separable cases.
Learning with example-dependent costs: A strong assumption is widespread through most cost-sensitive learning algorithms: misclassification costs are the same for all examples. In many cases this statement is not true.
We claim that using the example-dependent costs directly is more natural and will lead to the production of more accurate classifiers. For these reasons, we consider the extension of cost-sensitive sequences of Bregman losses to example-dependent cost scenarios to generate finely tuned posterior probability estimates.[+][-]