Área de Arquitectura y Tecnología de Computadoreshttp://hdl.handle.net/10016/8512018-01-24T11:29:32Z2018-01-24T11:29:32ZA Comparative study and evaluation of parallel programming models for shared-memory parallel architecturesSánchez García, Luis MiguelFernández Muñoz, JavierSotomayor Fernández, RafaelEscolar Díaz, María SoledadGarcía Sánchez, José Danielhttp://hdl.handle.net/10016/257672018-01-12T12:53:32Z2013-07-01T00:00:00ZA Comparative study and evaluation of parallel programming models for shared-memory parallel architectures
Sánchez García, Luis Miguel; Fernández Muñoz, Javier; Sotomayor Fernández, Rafael; Escolar Díaz, María Soledad; García Sánchez, José Daniel
Nowadays, shared-memory parallel architectures have evolved and new programming frameworks have appeared that exploit these architectures: OpenMP, TBB, Cilk Plus, ArBB and OpenCL. This article focuses on the most extended of these frameworks in commercial and scientific areas. This paper shows a comparative study of these frameworks and an evaluation. The study covers several capacities, such as task deployment, scheduling techniques, or programming language abstractions. The evaluation measures three dimensions: code development complexity, performance and efficiency, measure as speedup per watt. For this evaluation, several parallel benchmarks have been implemented with each framework. These benchmarks are created to cover certain scenarios, like regular memory access or irregular computation. The conclusions show some highlights, like the fact that some frameworks (OpenMP, Cilk Plus) are better for transforming quickly a sequential code, others (TBB) have a small footprint which is ideal for small problems, and others (OpenCL) are suited for heterogeneous architectures but they require a very complex development process. The conclusions also show that the vectorization support is more critical than multitasking to achieve efficiency for those problems where this approach fits.
2013-07-01T00:00:00ZOn Parallel Numerical Algorithms for Fractional Diffusion ProblemsCiegi, RaimondasStaricovicius, VadimasMargenov, Svetozarhttp://hdl.handle.net/10016/242192018-01-12T12:54:28Z2016-12-01T00:00:00ZOn Parallel Numerical Algorithms for Fractional Diffusion Problems
Ciegi, Raimondas; Staricovicius, Vadimas; Margenov, Svetozar
Carretero Pérez, Jesús; García Blas, Javier; Margenov, Svetozar
In this work, we consider a parallel numerical solution of problems depending onf ractional power sof elliptic
operator. Three different state of theart approaches are used to transform the original non-local problem into
well-known local PDE problems. Parallel numerical algorithms for allt hreea pproaches are developed and discussed.
Results of their parallel performance tests are presented and analysed.
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016.
2016-12-01T00:00:00ZA Data-Aware Scheduling Strategy for DMCF workflows over HerculesMarozzo, FabrizioCarretero Pérez, JesúsRodrigo Duro, Francisco JoséGarcía Blas, JavierTalia, DomenicoTrunfio, Paolohttp://hdl.handle.net/10016/242342018-01-12T12:54:27Z2016-10-06T00:00:00ZA Data-Aware Scheduling Strategy for DMCF workflows over Hercules
Marozzo, Fabrizio; Carretero Pérez, Jesús; Rodrigo Duro, Francisco José; García Blas, Javier; Talia, Domenico; Trunfio, Paolo
Carretero Pérez, Jesús; García Blas, Javier; Margenov, Svetozar
As data-intensive scientific prevalence arises, there is a necessity of simplifying the development, deployment, and execution of complex data analysis applications. The Data Mining Cloud Framework is a service-oriented system for allowing users to design and execute data analysis applications, defined as workflows, on cloud platforms, relying on cloud-provided storage services for I/O operations. Hercules is an in-memory I/O solution that can be deployed as an alternative to cloud storage services, providing additional performance and flexibility features. This work extends the DMCF-Hercules cooperation by applying novel data placement and task scheduling techniques for exposing and exploiting data locality in data-intensive workflows.
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016.
2016-10-06T00:00:00ZHeterogeneous computation of matrix productsAlonso, PedroManumachu, Ravy ReddyLastovetsky, Alexeyhttp://hdl.handle.net/10016/242332018-01-12T12:54:26Z2016-12-01T00:00:00ZHeterogeneous computation of matrix products
Alonso, Pedro; Manumachu, Ravy Reddy; Lastovetsky, Alexey
Carretero Pérez, Jesús; García Blas, Javier; Margenov, Svetozar
The work presented here is an experimental study of performance in execution time and energy consumption of matrix multiplications on
a heterogeneous server. The server features three different devices: a multicore CPU, an NVIDIA Tesla GPU, and an Intel Xeon Phi coprocessor.
Matrix multiplication is one of the most used linear algebra kernels and, consequently, applications that make an intensive use of this operation
can greatly benefit from efficient implementations. This is the case of the evaluation of matrix polynomials, a core operation used to calculate
many matrix functions, which involve a very large number of products of square matrices. Although there exist many proposals for efficient
implementations of matrix multiplications in heterogeneous environments, it is still difficult to find packages providing a matrix multiplication
routine that is so easy to use, efficient, and versatile as its homogeneous counterparts. Our approach here is based on a simple implementation
using OpenMP sections. We have also devised a functional model for the execution time that has been successfully applied to the evaluation of
matrix polynomials of large degree so that it allows to balance the workload and minimizes the runtime cost.
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016.
2016-12-01T00:00:00Z