RT Journal Article T1 Medical data wrangling with sequential variational autoencoders A1 Barrejón Moreno, Daniel A1 Martínez Olmos, Pablo A1 Artés Rodríguez, Antonio AB Medical data sets are usually corrupted by noise and missing data. These missing patterns are commonly assumed to be completely random, but in medical scenarios, the reality is that these patterns occur in bursts due to sensors that are off for some time or data collected in a misaligned uneven fashion, among other causes. This paper proposes to model medical data records with heterogeneous data types and bursty missing data using sequential variational autoencoders (VAEs). In particular, we propose a new methodology, the Shi-VAE, which extends the capabilities of VAEs to sequential streams of data with missing observations. We compare our model against state-of-theart solutions in an intensive care unit database (ICU) and a dataset of passive human monitoring. Furthermore, we find that standard error metrics such as RMSE are not conclusive enough to assess temporal models and include in our analysis the cross-correlation between the ground truth nd the imputed signal. We show that Shi-VAE achieves the best performance in terms of using both metrics, with lower computational complexity than the GP-VAE model, which is the state-of-the-art method for medical records. PB IEEE SN 2168-2194 SN 2168-2208 (online) YR 2022 FD 2022-06 LK https://hdl.handle.net/10016/35008 UL https://hdl.handle.net/10016/35008 LA eng NO This work was supported in part by Spanish Government MCI under Grants TEC2017-92552-EXP and RTI2018-099655-B-100, in part by Comunidad de Madrid under Grants IND2017/TIC-7618, IND2018/TIC-9649, IND2020/TIC-17372, and Y2018/TCS-4705, in part by BBVA Foundation under the Deep-DARWiN Project, and in part by the European Union (FEDER) and the European Research Council (ERC) through the European Union's Horizon 2020 research and innovation program under Grant 714161. DS e-Archivo RD 27 jul. 2024