ASPIDE: Exascale programIng models for extreme data processing

Permanent URI for this collection

https://hdl.handle.net/10016/29635

Browse

Now showing 1 - 9 of 9

Simplified workflow simulation on clouds based on computation and communication noisiness
(IEEE, 2020-07-01) Mathá, Roland; Ristov, Sasko; Fahringer, Thomas; Prodan, Radu; European Commission
Many researchers rely on simulations to analyze and validate their researched methods on Cloud infrastructures. However, determining relevant simulation parameters and correctly instantiating them to match the real Cloud performance is a difficult and costly operation, as minor configuration changes can easily generate an unreliable inaccurate simulation result. Using legacy values experimentally determined by other researchers can reduce the configuration costs, but is still inaccurate as the underlying public Clouds and the number of active tenants are highly different and dynamic in time. To overcome these deficiencies, we propose a novel model that simulates the dynamic Cloud performance by introducing noise in the computation and communication tasks, determined by a small set of runtime execution data. Although the estimating method is apparently costly, a comprehensive sensitivity analysis shows that the configuration parameters determined for a certain simulation setup can be used for other simulations too, thereby reducing the tuning cost by up to 82.46 percent, while declining the simulation accuracy by only 1.98 percent on average. Extensive evaluation also shows that our novel model outperforms other state-of-the-art dynamic Cloud simulation models, leading up to 22 percent lower makespan inaccuracy
Monitoring of exascale data processing
(IEEE, 2020-01-02) Iuhasz, Gabriel; Petcu, Dana; European Commission
Exascale systems are a hot topic of research in computer science. These systems in contrast to current Cloud, Big Data and HPC systems will routinely contain hundreds of thousand of nodes generating millions of events. At this scale of hardware fault and anomalous behaviour is not only more likely but to be expected. In this paper we describe the architecture of and Exascale monitoring solution coupled with an event detection component. The latter component is extremely important in order to handle the multitude of potential events. We describe the major lacking research that needs to be done, which will make event detection freezable in real world Exascale systems.
Multi-objective scheduling of extreme data scientific workflows in Fog
(Elsevier B.V., 2020-05) De Maio, Vincenzo; Kimovski, Dragi; European Commission
The concept of “extreme data” is a recent re-incarnation of the “big data” problem, which is distinguished by the massive amounts of information that must be analyzed with strict time requirements. In the past decade, the Cloud data centers have been envisioned as the essential computing architectures for enabling extreme data workflows. However, the Cloud data centers are often geographically distributed. Such geographical distribution increases offloading latency, making it unsuitable for processing of workflows with strict latency requirements, as the data transfer times could be very high. Fog computing emerged as a promising solution to this issue, as it allows partial workflow processing in lower-network layers. Performing data processing on the Fog significantly reduces data transfer latency, allowing to meet the workflows’ strict latency requirements. However, the Fog layer is highly heterogeneous and loosely connected, which affects reliability and response time of task offloading. In this work, we investigate the potential of Fog for scheduling of extreme data workflows with strict response time requirements. Moreover, we propose a novel Pareto-based approach for task offloading in Fog, called Multi-objective Workflow Offloading (MOWO). MOWO considers three optimization objectives, namely response time, reliability, and financial cost. We evaluate MOWO workflow scheduler on a set of real-world biomedical, meteorological and astronomy workflows representing examples of extreme data application with strict latency requirements.
A scalable platform for monitoring data intensive applications
(Springer Nature B.V., 2019-09) Drăgan, Ioan; Iuhasz, Gabriel; Petcu, Dana; European Commission
Latest advances in information technology and the widespread growth in different areas are producing large amounts of data. Consequently, in the past decade a large number of distributed platforms for storing and processing large datasets have been proposed. Whether in development or in production, monitoring the applications running on these platforms is not an easy task, dedicated tools and platforms were proposed for this task yet none are specially designed for Big Data frameworks. In this paper we present a distributed, scalable, highly available platform able to collect, store, query and process monitoring data obtained from multiple Big Data frameworks. Alongside the architecture we experimentally show that the solution proposed is scalable and can handle a substantial quantity of monitoring data.
Perspectives on anomaly and event detection in exascale systems
(IEEE, 2019-08-29) Iuhasz, Gabriel; Petcu, Dana; European Commission
The design and implementation of exascale system is nowadays an important challenge. Such a system is expected to combine HPC with Big Data methods and technologies to allow the execution of scientific workloads which are not tractable at this present time. In this paper we focus on an event and anomaly detection framework which is crucial in giving a global overview of a exascale system (which in turn is necessary for the successful implementation and exploitation of the system). We propose an architecture for such a framework and show how it can be used to handle failures during job execution.
Exploring stream parallel patterns in distributed MPI environments
(Elsevier B.V., 2019-05) López Gómez, Javier; Fernández Muñoz, Javier; Río Astorga, David del; Dolz Zaragoza, Manuel Francisco; García Sánchez, José Daniel; European Commission; Ministerio de Economía y Competitividad (España)
In recent years, the large volumes of stream data and the near real-time requirements of data streaming applications have exacerbated the need for new scalable algorithms and programming interfaces for distributed and shared-memory platforms. To contribute in this direction, this paper presents a new distributed MPI back end for GrPPI, a C++ high-level generic interface of data-intensive and stream processing parallel patterns. This back end, as a new execution policy, supports distributed and hybrid (distributed+shared-memory) parallel executions of the Pipeline and Farm patterns, where the hybrid mode combines the MPI policy with a GrPPI shared-memory one. These patterns internally leverage distributed queues, which can be configured to use two-sided or one-sided MPI primitives to communicate items among nodes. A detailed analysis of the GrPPI MPI execution policy reports considerable benefits from the programmability, flexibility and readability points of view. The experimental evaluation of two different streaming applications with different distributed and shared-memory scenarios reports considerable performance gains with respect to the sequential versions at the expense of negligible GrPPI overheads.
Detecting semantic violations of lock-free data structures through C++ contracts
(Springer Nature, 2019-03-26) López Gómez, Javier; Río Astorga, David del; Dolz Zaragoza, Manuel Francisco; Fernández Muñoz, Javier; Garcia Sanchez, Jose Daniel; European Commission; Ministerio de Economía y Competitividad (España)
The use of synchronization mechanisms in multithreaded applications is essential on shared-memory multi-core architectures. However, debugging parallel applications to avoid potential failures, such as data races or deadlocks, can be challenging. Race detectors are key to spot such concurrency bugs; nevertheless, if lock-free data structures are used, these may emit a significant number of false positives. In this paper, we present a framework for semantic violation detection of lock-free data structures which makes use of contracts, a novel feature of the upcoming C++20, and a customized version of the ThreadSanitizer race detector. We evaluate the detection accuracy of the framework in terms of false positives and false negatives leveraging some synthetic benchmarks which make use of the SPSC and MPMC lock-free queue structures from the Boost C++ library. Thanks to this framework, we are able to check the correct use of lock-free data structures, thus reducing the number of false positives.
Exploiting stream parallelism of MRI reconstruction using GrPPI over multiple back-ends
(IEEE, 2019-07-04) García Blas, Francisco Javier; Río Astorga, David del; García Sánchez, José Daniel; Carretero Pérez, Jesús; European Commission; Ministerio de Economía y Competitividad (España)
In recent years, on-line processing of data streams has been established as a major computing paradigm. This is due mainly to two reasons: first, more and more data are generated in near real-time that need to be processed; the second reason is given by the need of efficient parallel applications. However, the above-mentioned areas expose a tough challenge over traditional data-analysis techniques, which have been forced to evolve to a stream perspective. In this work we present an comparative study of a stream-aware multi-staged application, which has been implemented using GrPPI, a generic and reusable parallel pattern interface for C++ applications. We demonstrate the benefits of using this interface in terms of programability, performance, and scalability.
Performance-aware scheduling of parallel applications on non-dedicated clusters
(MDPI, 2019-09-02) Cascajo García, Alberto; Expósito Singh, David; Carretero Pérez, Jesús; European Commission; Ministerio de Economía y Competitividad (España)
This work presents a HPC framework that provides new strategies for resource management and job scheduling, based on executing different applications in shared compute nodes, maximizing platform utilization. The framework includes a scalable monitoring tool that is able to analyze the platform's compute node utilization. We also introduce an extension of CLARISSE, a middleware for data-staging coordination and control on large-scale HPC platforms that uses the information provided by the monitor in combination with application-level analysis to detect performance degradation in the running applications. This degradation, caused by the fact that the applications share the compute nodes and may compete for their resources, is avoided by means of dynamic application migration. A description of the architecture, as well as a practical evaluation of the proposal, shows significant performance improvements up to 20% in the makespan and 10% in energy consumption compared to a non-optimized execution.

Browse

Recent Submissions