Publication: Clustering in high dimension for multivariate and functional data using extreme kurtosis projections
Loading...
Identifiers
Publication date
2017-05
Defense date
2017-06-19
Authors
Tutors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Cluster analysis is a problem that consists of the analysis of the existence of
clusters in a multivariate sample. This analysis is performed by algorithms that
differ significantly in their notion of what constitutes a cluster and how to find them
efficiently. In this thesis we are interested in large data problems and therefore we
consider algorithms that use dimension reduction techniques for the identification
of interesting structures in large data sets. Particularly in those algorithms that
use the kurtosis coefficient to detect the clusters present in the data.
The thesis extends the work of Peña and Prieto (2001a) of identifying clusters
in multivariate data using the univariate projections of the sample data on the
directions that minimize and maximize the kurtosis coefficient of the projected
data, and Peña et al. (2010) who used the eigenvalues of a kurtosis matrix to
reduce the dimension.
This thesis has two main contributions:
First, we prove that the extreme kurtosis projections have some optimality
properties for mixtures of normal distributions and we propose an algorithm to
identify clusters when the data dimension and the number of clusters present in
the sample are high. The good performance of the algorithm is shown through a
simulations study where it is compared it with MCLUST, K-means and CLARA
methods.
Second, we propose the extension of multivariate kurtosis for functional data, and we analyze some of its properties for clustering. Additionally, we propose an
algorithm based on kurtosis projections for functional data. Its good properties
are compared with the results obtained by Functional Principal Components,
Functional K-means and FunClust method.
The thesis is structured as follows: Chapter 1 is an introductory Chapter where
we will review some theoretical concepts that will be used throughout the thesis.
In Chapter 2 we review in detail the concept of kurtosis. We study the
properties of kurtosis. Give a detailed description of some algorithms proposed
in the literature that use the kurtosis coefficient to detect the clusters present in
the data.
In Chapter 3 we study the directions that may be interesting for the detection
of several clusters in the sample and we analyze how the extreme kurtosis directions
are related to these directions. In addition, we present a clustering algorithm for
high-dimensional data using extreme kurtosis directions.
In Chapter 4 we introduce an extension of the multivariate kurtosis for the
functional data and we analyze the properties of this measure regarding the
identification of clusters. In addition, we present a clustering algorithm for
functional data using extreme kurtosis directions.
We finish with some remarks and conclusions in the final Chapter.
Description
Keywords
Cluster analysis, Kurtosis coefficient, Multivariate data, Functional data