Publication:
Resilience of Parallel Applications

dc.affiliation.dptoUC3M. Departamento de Informáticaes
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Arquitectura de Computadores, Comunicaciones y Sistemases
dc.contributor.authorLosada, Nuria
dc.contributor.authorMartín, María J.
dc.contributor.authorGonzález, Patricia
dc.contributor.editorCarretero Pérez, Jesús
dc.contributor.editorGarcía Blas, Javier
dc.contributor.editorPetcu, Dana
dc.date.accessioned2016-04-29T07:51:47Z
dc.date.available2016-04-29T07:51:47Z
dc.date.issued2016-02
dc.descriptionProceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.en
dc.description.abstractFuture exascale systems are predicted to be formed by millions of cores. This is a great opportunity for HPC applications, however, it is also a hazard for the completion of their execution. Even if one computation node presents a failure every one century, a machine with 100.000 nodes will encounter a failure every 9 hours. Thus, HPC applications need to make use of fault tolerance techniques to ensure they successfully finish their execution. This PhD thesis is focused on fault tolerance solutions for generic parallel applications, more specifically in checkpointing solutions. We have extended CPPC, an MPI application-level portable checkpointing tool developed in our research group, to work with OpenMP applications, and hybrid MPI-OpenMP applications. Currently, we are working on transparently obtaining resilient MPI applications, that is, applications that are able to recover themselves from failures without stopping their execution.en
dc.description.sponsorshipEuropean Cooperation in Science and Technology. COSTen
dc.description.sponsorshipThis research was supported by the Ministry of Economy and Competitiveness of Spain and FEDER funds of the EU (Project TIN2013-42148-P, and the predoctoral grant of Nuria Losada ref. BES-2014-068066) and by EU under the COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS).en
dc.format.extent4
dc.format.mimetypeapplication/pdf
dc.identifier.bibliographicCitationCarretero Pérez, Jesús; et.al. (eds.). (2016). Proceedings of the First PhD Symposium on Sustainable UltrascaleComputing Systems (NESUS PhD 2016). Timisoara, Romania. Universidad Carlos III de Madrid, ARCOS. Pp. 29-32.en
dc.identifier.isbn978-84-608-6309-0
dc.identifier.publicationfirstpage29
dc.identifier.publicationlastpage32
dc.identifier.publicationtitleProceedings of the First PhD Symposium on Sustainable UltrascaleComputing Systems (NESUS PhD 2016)en
dc.identifier.urihttps://hdl.handle.net/10016/22885
dc.language.isoengen
dc.relation.eventdateFebruary 8-11, 2016en
dc.relation.eventnumber1
dc.relation.eventplaceTimisoara, Romaniaes
dc.relation.eventtitlePhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)en
dc.relation.projectIDGobierno de España. TIN2013-42148-P
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 España
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject.ecienciaInformáticaes
dc.titleResilience of Parallel Applicationsen
dc.typeconference paper*
dc.type.hasVersionVoR*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
resilience_losada_nesus_2016.pdf
Size:
393.67 KB
Format:
Adobe Portable Document Format