RT Journal Article T1 Handling incomplete heterogeneous data using VAEs A1 Nazabal Renteria, Alfredo A1 Martínez Olmos, Pablo A1 Ghahramani, Zoubin A1 Valera Martínez, María Isabel AB Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications.In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data. PB Elsevier SN 0031-3203 YR 2020 FD 2020-11 LK https://hdl.handle.net/10016/32743 UL https://hdl.handle.net/10016/32743 LA eng NO The authors wish to thank Christopher K. I. Williams, for fruitful discussions and helpful comments to the manuscript. Alfredo Nazabal would like to acknowledge the funding provided by the UK Government’s Defence & Security Programme in support of the Alan Turing Institute, EPSRC Grant EP/N510129/1. The work of Pablo M. Olmos is sup-ported by Spanish government MCI under grant RTI2018-099655-B-100, by Comunidad de Madrid under grants IND2017/TIC-7618, IND2018/TIC-9649, and Y2018/TCS-4705, by BBVA Foundation under the Deep-DARWiNproject, and by the European Union (FEDER and the European Research Council (ERC) through the European Unions Horizon 2020 research and innovation program under Grant 714161). Zoubin Ghahramani acknowledges support from the Alan Turing Institute (EPSRC Grant EP/N510129/1) and EPSRC Grant EP/N014162/1, and donations from Google and Microsoft Research. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. DS e-Archivo RD 1 sept. 2024