RT Conference Proceedings T1 Resilience of Parallel Applications A1 Losada, Nuria A1 Martín, María J. A1 González, Patricia A2 Carretero Pérez, Jesús A2 García Blas, Javier A2 Petcu, Dana AB Future exascale systems are predicted to be formed by millions of cores. This is a great opportunity for HPCapplications, however, it is also a hazard for the completion of their execution. Even if one computation nodepresents a failure every one century, a machine with 100.000 nodes will encounter a failure every 9 hours. Thus,HPC applications need to make use of fault tolerance techniques to ensure they successfully finish their execution.This PhD thesis is focused on fault tolerance solutions for generic parallel applications, more specifically in checkpointingsolutions. We have extended CPPC, an MPI application-level portable checkpointing tool developed inour research group, to work with OpenMP applications, and hybrid MPI-OpenMP applications. Currently, weare working on transparently obtaining resilient MPI applications, that is, applications that are able to recoverthemselves from failures without stopping their execution. SN 978-84-608-6309-0 YR 2016 FD 2016-02 LK https://hdl.handle.net/10016/22885 UL https://hdl.handle.net/10016/22885 LA eng NO Proceedings of the First PhD Symposium on Sustainable UltrascaleComputing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016. NO European Cooperation in Science and Technology. COST NO This research was supported by the Ministry of Economy and Competitiveness of Spain and FEDER fundsof the EU (Project TIN2013-42148-P, and the predoctoral grant of Nuria Losada ref. BES-2014-068066) andby EU under the COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS). DS e-Archivo RD 17 jul. 2024