RT Journal Article
T1 Handling incomplete heterogeneous data using VAEs
A1 Nazabal Renteria, Alfredo
A1 Martínez Olmos, Pablo
A1 Ghahramani, Zoubin
A1 Valera Martínez, María Isabel
AB Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications.In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data.
PB Elsevier
SN 0031-3203
YR 2020
FD 2020-11
LK https://hdl.handle.net/10016/32743
UL https://hdl.handle.net/10016/32743
LA eng
NO The authors wish to thank Christopher K. I. Williams, for fruitful discussions and helpful comments to the manuscript. Alfredo Nazabal would like to acknowledge the funding provided by the UK Government’s Defence & Security Programme in support of the Alan Turing Institute, EPSRC Grant EP/N510129/1. The work of Pablo M. Olmos is sup-ported by Spanish government MCI under grant RTI2018-099655-B-100, by Comunidad de Madrid under grants IND2017/TIC-7618, IND2018/TIC-9649, and Y2018/TCS-4705, by BBVA Foundation under the Deep-DARWiNproject, and by the European Union (FEDER and the European Research Council (ERC) through the European Unions Horizon 2020 research and innovation program under Grant 714161). Zoubin Ghahramani acknowledges support from the Alan Turing Institute (EPSRC Grant EP/N510129/1) and EPSRC Grant EP/N014162/1, and donations from Google and Microsoft Research. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.
DS e-Archivo
RD 1 sept. 2024