Publication:
Clustering in high dimension for multivariate and functional data using extreme kurtosis projections

Loading...
Thumbnail Image
Identifiers
Publication date
2017-05
Defense date
2017-06-19
Tutors
Journal Title
Journal ISSN
Volume Title
Publisher
Impact
Google Scholar
Export
Research Projects
Organizational Units
Journal Issue
Abstract
Cluster analysis is a problem that consists of the analysis of the existence of clusters in a multivariate sample. This analysis is performed by algorithms that differ significantly in their notion of what constitutes a cluster and how to find them efficiently. In this thesis we are interested in large data problems and therefore we consider algorithms that use dimension reduction techniques for the identification of interesting structures in large data sets. Particularly in those algorithms that use the kurtosis coefficient to detect the clusters present in the data. The thesis extends the work of Peña and Prieto (2001a) of identifying clusters in multivariate data using the univariate projections of the sample data on the directions that minimize and maximize the kurtosis coefficient of the projected data, and Peña et al. (2010) who used the eigenvalues of a kurtosis matrix to reduce the dimension. This thesis has two main contributions: First, we prove that the extreme kurtosis projections have some optimality properties for mixtures of normal distributions and we propose an algorithm to identify clusters when the data dimension and the number of clusters present in the sample are high. The good performance of the algorithm is shown through a simulations study where it is compared it with MCLUST, K-means and CLARA methods. Second, we propose the extension of multivariate kurtosis for functional data, and we analyze some of its properties for clustering. Additionally, we propose an algorithm based on kurtosis projections for functional data. Its good properties are compared with the results obtained by Functional Principal Components, Functional K-means and FunClust method. The thesis is structured as follows: Chapter 1 is an introductory Chapter where we will review some theoretical concepts that will be used throughout the thesis. In Chapter 2 we review in detail the concept of kurtosis. We study the properties of kurtosis. Give a detailed description of some algorithms proposed in the literature that use the kurtosis coefficient to detect the clusters present in the data. In Chapter 3 we study the directions that may be interesting for the detection of several clusters in the sample and we analyze how the extreme kurtosis directions are related to these directions. In addition, we present a clustering algorithm for high-dimensional data using extreme kurtosis directions. In Chapter 4 we introduce an extension of the multivariate kurtosis for the functional data and we analyze the properties of this measure regarding the identification of clusters. In addition, we present a clustering algorithm for functional data using extreme kurtosis directions. We finish with some remarks and conclusions in the final Chapter.
Description
Keywords
Cluster analysis, Kurtosis coefficient, Multivariate data, Functional data
Bibliographic citation
Collections