Losada, NuriaMartín, María J.González, PatriciaCarretero Pérez, JesúsGarcía Blas, JavierPetcu, Dana2016-04-292016-04-292016-02Carretero Pérez, Jesús; et.al. (eds.). (2016). Proceedings of the First PhD Symposium on Sustainable UltrascaleComputing Systems (NESUS PhD 2016). Timisoara, Romania. Universidad Carlos III de Madrid, ARCOS. Pp. 29-32.978-84-608-6309-0https://hdl.handle.net/10016/22885Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.Future exascale systems are predicted to be formed by millions of cores. This is a great opportunity for HPC applications, however, it is also a hazard for the completion of their execution. Even if one computation node presents a failure every one century, a machine with 100.000 nodes will encounter a failure every 9 hours. Thus, HPC applications need to make use of fault tolerance techniques to ensure they successfully finish their execution. This PhD thesis is focused on fault tolerance solutions for generic parallel applications, more specifically in checkpointing solutions. We have extended CPPC, an MPI application-level portable checkpointing tool developed in our research group, to work with OpenMP applications, and hybrid MPI-OpenMP applications. Currently, we are working on transparently obtaining resilient MPI applications, that is, applications that are able to recover themselves from failures without stopping their execution.4application/pdfengAtribución-NoComercial-SinDerivadas 3.0 EspañaResilience of Parallel Applicationsconference paperInformáticaopen access2932Proceedings of the First PhD Symposium on Sustainable UltrascaleComputing Systems (NESUS PhD 2016)