RT Dissertation/Thesis
T1 Scalable Outlier Detection Methods for Functional Data
A1 Ojo, Oluwasegun Taiwo
A2 IMDEA Networks Institute, 
AB Recent technological advances have led to an exponential growth in the volume of datagenerated. The quest to make sense of these data, some of which are usually complex,has led to recent interest in development of statistical methods for analysing data withcomplex structures. One such field of interest is functional data analysis (FDA), whichdeals with the analysis of data that can be considered as functions, curves, or surfacesobserved over a domain set. Outlier detection is a challenging but important part ofthe exploratory analysis process in FDA because functional observations can exhibitoutlyingness in various ways compared to the bulk of the data. This thesis addressesthe problem of detecting and classifying outliers in functional data with three maincontributions.First, the fdaoutlier R package is presented in Chapter 2. The package containsimplementations of some of the state-of-the-art functional outlier detection methodsin the literature. Some of the methods implemented include directional outlyingness,magnitude-shape plot, sequential transformations, total variation depth, and modifiedshape similarity index. Detailed illustrations of the functions of the package are provided,using various simulated and real functional datasets curated from the functionaloutlier detection literature. Overviews of the functional outlier detection methods implementedin the package are also presented in Chapter 2. This chapter therefore, servesas a review of some of the current literature in outlier detection for functional data.Next, two new methods, named ‘Semifast- MUOD’ and ‘Fast-MUOD’, are presentedin Chapter 3. These methods work by computing for each curve three indices (magnitude,amplitude and shape index) that measure the outlyingness of that curve in termsof its magnitude, amplitude and shape. ‘Semifast- MUOD’ computes these indices withrespect to (w.r.t.) a random sample of the dataset, while ‘Fast-MUOD’ computes theseindices w.r.t. to the point-wise or L1 median. The classical boxplot is then used as acutoff on the three indices to identify curves that are outliers of different types. A byproductof the methods is an unsupervised classification of the outliers into differenttypes, without the need for visualisation. Performance evaluation of the methods, usingvarious real and simulated datasets, shows that Fast-MUOD is the better of the two new proposed methods for outlier detection, in addition to being very scalable. Comparisonswith latest functional outlier detection methods in the literature also showsuperior or comparable outlier detection performance.In Chapter 4, some theoretical properties of the Fast-MUOD indices are presented.These include some definitions of the indices, as well as convergence proofs of the sampleapproximations. Some properties of the indices under simple transformations arealso presented in this chapter. Finally, three techniques are presented in Chapter 5 forextending the Fast-MUOD indices to outlier detection in multivariate functional dataobserved on the same domain. These techniques include the use of random projectionsand identifying outliers on the marginal components of the multivariate functional data.The use of random projections showed the best result in performance evaluations withvarious real and simulated datasets.Chapter 6 contains some concluding remarks and possible future research work.
YR 2022
FD 2022-10-11
LK https://hdl.handle.net/10016/36521
UL https://hdl.handle.net/10016/36521
LA eng
NO Mención Internacional en el título de doctor
NO This work has been supported by IMDEA Networks Institute
DS e-Archivo
RD 20 may. 2024