Applying big data paradigms to a large scale scientific workflow: lessons learned and future directions

e-Archivo Repository

Show simple item record

dc.contributor.author Caino Lores, Silvina
dc.contributor.author Lapin, Andei
dc.contributor.author Carretero Pérez, Jesús
dc.contributor.author Kropf, Peter
dc.date.accessioned 2021-12-15T10:40:36Z
dc.date.available 2021-12-15T10:40:36Z
dc.date.issued 2018-04-17
dc.identifier.bibliographicCitation Caíno-Lores, S., Lapin, A., Carretero, J., Kropf, P. (2020). Applying big data paradigms to a large scale scientific workflow: Lessons learned and future directions. Future Generation Computer Systems, 110, pp. 440-452. http://doi.org/10.1016/j.future.2018.04.014
dc.identifier.issn 0167-739X
dc.identifier.uri http://hdl.handle.net/10016/33770
dc.description.abstract The increasing amounts of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience combining the traditional high-performance computing and grid-based approaches with Big Data analytics paradigms, in the context of scientific ensemble workflows. Our goal was to assess and discuss the suitability of such data-oriented mechanisms for production-ready workflows, especially in terms of scalability. We focused on two key elements in the Big Data ecosystem: the data-centric programming model, and the underlying infrastructure that integrates storage and computation in each node. We experimented with a representative MPI-based iterative workflow from the hydrology domain, EnKF-HGS, which we re-implemented using the Spark data analysis framework. We conducted experiments on a local cluster, a private cloud running OpenNebula, and the Amazon Elastic Compute Cloud (AmazonEC2). The results we obtained were analysed to synthesize the lessons we learned from this experience, while discussing promising directions for further research.
dc.description.sponsorship This work was supported by the Spanish Ministry of Economics and Competitiveness grant TIN-2013-41350-P, the IC1305 COST Action “Network for Sustainable Ultrascale Computing Platforms” (NESUS), and the FPU Training Program for Academic and Teaching Staff Grant FPU15/00422 by the Spanish Ministry of Education .
dc.language.iso eng
dc.publisher Elsevier
dc.rights © 2018 Elsevier B.V. All rights reserved.
dc.rights Atribución-NoComercial-SinDerivadas 3.0 España
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject.other scientific workflows
dc.subject.other big data
dc.subject.other cloud computing
dc.subject.other apache spark
dc.subject.other hydrology
dc.title Applying big data paradigms to a large scale scientific workflow: lessons learned and future directions
dc.type article
dc.subject.eciencia Informática
dc.identifier.doi https://doi.org/10.1016/j.future.2018.04.014
dc.rights.accessRights openAccess
dc.relation.projectID Gobierno de España. TIN-2013-41350-P
dc.relation.projectID Gobierno de España. FPU15/00422
dc.type.version acceptedVersion
dc.identifier.publicationfirstpage 440
dc.identifier.publicationlastpage 452
dc.identifier.publicationtitle Future Generation Computer Systems-The International Journal of eScience
dc.identifier.publicationvolume 110
dc.identifier.uxxi AR/0000021760
dc.contributor.funder Ministerio de Economía y Competitividad (España)
dc.contributor.funder Ministerio de Educación, Cultura y Deporte (España)
 Find Full text

Files in this item

*Click on file's image for preview. (Embargoed files's preview is not supported)


The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record