Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015)

Permanent URI for this collection

Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015)
  • Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015)


  • ÍNDICE



  • Preface [Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015)]

  • Parallel Processing For Graviti Inversion, Neki Frasheri

  • Solution of Bi-objective Competitive Facility Location Problem Using Parallel Stochastic Search Algorithm, Algirdas Lancinskas, Pilar M. Ortigosa and Julius Zilinskas

  • A scheduler for Cloud Bursting of Map-Intensive Traffic Analysis Jobs, Ricardo Morla, Pedro Gonçalves and Jorge Barbosa

  • Distributed Parallel Computing for Visual Cryptography Algorithms, Raimondas Ciegis, Vadimas Starikovicius, Natalija Tumanova, Minvydas Ragulskis and Rita Palivonaite

  • On Autonomic HPC Clouds, Dana Petcu

  • Labeling connected components in binary images based on cellular automata, Biljana Stamatovic

  • Nature-Inspired Algorithm for Solving NP-Complete Problems, Atanas Hristov

  • Log File Analysis in Cloud with Apache Hadoop and Apache Spark, Ilias Mavridis and Eleni Karatza

  • NekBone with Optimized OpenACC directives, Jing Gong, Stegano Markidis, Michael Schliephake, Erwin Laure, Luis Cebamanos, Alistair Hart, Misun Min and Paul Fischer

  • Scheduler hierarchies for enabling peta-sacale cloud simulations with DISSECT-CF, Gabor Kecskemeti

  • NUMA impact on network storage protocols over high-speed raw Ethernet, Pilar González-Férez and Angelos Bilas

  • Evaluating data caching techniques in DMCF workflows using Hercules, Francisco Rodrigo Duro, Fabrizio Marozzo, Javier García Blas, Jesús Carretero, Domenico Talia and Paolo Trunfio

  • Analyzing power consumption of I/O operations in HPC applications, Pablo Llopis Sanmillan, Manuel Dolz, Javier García Blas, Florin Isaila, Jesús Carretero, Mohammad Reza Heidari and Michael Kuhn

  • FriendComputing: Organic application centric distributed computing, Beat Wolf, Loic Monney and Pierre Kuonen

  • Multilevel parallelism in sequence alignment using a streaming approach, Beat Wolf, Pierre Kuonen and Thomas Dandekar

  • Exploiting Heterogeneous Compute Resources for Optimizing Lightweight Structures, Robert Dietze, Michael Hofmann and Gudula Ruenger

  • Chameleon C2HDL Design Tool In Self-Configurable Ultrascale Computer Systems Based On Partially Reconfigurable FPGAs, Anatoliy Melnyk, Viktor Melnyk and Lyubomyr Tsyhylyk

  • HPC in Computational Micromechanics of Composite Materials, Radim Blaheta, Alexej Kolcun, Ondrej Jakl, Kamil Soucek, Jiri Stary and Ivan Georgiev

  • Browse

    Recent Submissions

    Now showing 1 - 20 of 20
    • Publication
      Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland
      (2015-10) García Blas, Francisco Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Carretero Pérez, Jesús; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
    • Publication
      Preface [Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015)]
      (2015-10) Carretero Pérez, Jesús; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
    • Publication
      Analyzing Power Consumption of I/O Operations in HPC Applications
      (2015-10) Llopis Sanmillán, Pablo; Dolz Zaragoza, Manuel Francisco; García Blas, Francisco Javier; Isaila, Florin Daniel; Carretero Pérez, Jesús; Heidari, Mohammad Reza; Kuhn, Michael
      Data movement is becoming a key issue in terms of performance and energy consumption in high performance computing (HPC) systems, in general, and Exascale systems, in particular. A preliminary step to perform I/O optimization and face the Exascale challenges is to deepen our understanding of energy consumption across the I/O stacks. In this paper, we analyze the power draw of different I/O operations using a new fine-grained internal wattmeter while simultaneously collecting system metrics. Based on correlations between the recorded metrics and the instantaneous internal power consumption, our methodology identifies the significant metrics with respect to power consumption and decides which ones should contribute directly or in a derivative manner. This approach has the advantage of building I/O power models based on a previous set of identified utilization metrics. This technique will be validated using write operations on an Intel Xeon Nehalem server system, as writes exhibit interesting patterns and distinct power regimes.
    • Publication
      Evaluating data caching techniques in DMCF workflows using Hercules
      (2015-10) Rodrigo Duro, Francisco José; Marozzo, Fabrizio; García Blas, Javier; Carretero Pérez, Jesús; Talia, Domenico; Trunfio, Paolo
      The Data Mining Cloud Framework (DMCF) is an environment for designing and executing data analysis workflows in cloud platforms. Currently, DMCF relies on the default storage of the public cloud provider for any I/O related operation. This implies that the I/O performance of DMCF is limited by the performance of the default storage. In this work we propose the usage of the Hercules system within DMCF as an ad-hoc storage system for temporary data produced inside workflow-based applications. Hercules is a distributed in-memory storage system highly scalable and easy to deploy. The proposed solution takes advantage of the scalability capabilities of Hercules to avoid the bandwidth limits of the default storage. Early experimental results are presented in this paper, they show promising performance, particularly for write operations, compared to the performance obtained using the default storage services.
    • Publication
      HPC in Computational Micromechanics of Composite Materials
      (2015-10) Blaheta, Radim; Kolcun, Alexej; Jakl, Ondrej; Soucek, Kamil; Starý, Jirí; Georgiev, Ivan; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
    • Publication
      Chameleon C2HDL Design Tool In Self-Configurable Ultrascale Computer Systems Based On Partially Reconfigurable FPGAs
      (2015-10) Melnyk, Anatoliy; Melnyk, Viktor; Tsyhylyk, Lyubomyr; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      The FPGA-based accelerators and reconfigurable computer systems based on them require designing the applicationspecific processors soft-cores and are effective for certain classes of problems only, for which these soft-cores were previously developed. In Self-Configurable FPGA-based Computer Systems the challenge of designing the application-specific processors soft-cores is solved with use of the C2HDL tools, allowing them to be generated automatically. In this paper, we study the questions of the self-configurable computer systems efficiency increasing with use of the partially reconfigurable FPGAs and Chameleonc C2HDL design tool, corresponding to the goals of the project entitled "Improvement of heterogeneous systems efficiency using self-configurable FPGA-based computing" which is a part of the NESUS action. One of the features of the Chameleonc C2HDL design tool is its ability to generate a number of application-specific processors soft-cores executing the same algorithm that differ by the amount of FPGA resources required for their implementation. If the self-configurable computer systems are based on partially reconfigurable FPGAs, this feature allows them to acquire in every moment of its operation such a configuration that will provide an optimal use of its reconfigurable logic at a given level of hardware multitasking.
    • Publication
      Exploiting Heterogeneous Compute Resources for Optimizing Lightweight Structures
      (2015-10) Dietze, Robert; Hofmann, Michael; Rünger, Gudula; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      Optimizing lightweight structures with numerical simulations leads to the development of complex simulation codes with high computational demands. The optimization approach for lightweight structures consisting of fiberreinforced plastics is considered. During the simulated optimization, independent simulation tasks have to be executed efficiently on the heterogeneous computing resources. In this article, several scheduling methods for distributing parallel simulation tasks among compute nodes are presented. Performance results are shown for the scheduling and execution of synthetic benchmark tasks, matrix multiplication tasks, as well as FEM simulation tasks on a heterogeneous compute cluster.
    • Publication
      Multilevel parallelism in sequence alignment using a streaming approach
      (2015-10) Wolf, Beat; Kuonen, Pierre; Dandekar, Thomas; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      Ultrascale computing and bioinformatics are two rapidly growing fields with a big impact right now and even more so in the future. The introduction of next generation sequencing pushes current bioinformatics tools and workflows to their limits in terms of performance. This forces the tools to become increasingly performant to keep up with the growing speed at which sequencing data is created. Ultrascale computing can greatly benefit bioinformatics in the challenges it faces today, especially in terms of scalability, data management and reliability. But before this is possible, the algorithms and software used in the field of bioinformatics need to be prepared to be used in a heterogeneous distributed environment. For this paper we choose to look at sequence alignment, which has been an active topic of research to speed up next generation sequence analysis, as it is ideally suited for parallel processing. We present a multilevel stream based parallel architecture to transparently distribute sequence alignment over multiple cores of the same machine, multiple machines and cloud resources. The same concepts are used to achieve multithreaded and distributed parallelism, making the architecture simple to extend and adapt to new situations. A prototype of the architecture has been implemented using an existing commercial sequence aligner. We demonstrate the flexibility of the implementation by running it on different configurations, combining local and cloud computing resources.
    • Publication
      FriendComputing: Organic application centric distributed computing
      (2015-10) Wolf, Beat; Monney, Loic; Kuonen, Pierre; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      Building Ultrascale computer systems is a hard problem, not yet solved and fully explored. Combining the computing resources of multiple organizations, often in different administrative domains with heterogeneous hardware and diverse demands on the system, requires new tools and frameworks to be put in place. During previous work we developed POP-Java, a Java programming language extension that allows to easily develop distributed applications in a heterogeneous environment. We now present an extension to the POP-Java language, that allows to create application centered networks in which any member can benefit from the computing power and storage capacity of its members. An accounting system is integrated, allowing the different members of the network to bill the usage of their resources to the other members, if so desired. The system is expanded through a similar process as seen in social networks, making it possible to use the resources of friend and friends of friends. Parts of the proposed system has been implemented as a prototype inside the POP-Java programming language.
    • Publication
      NUMA impact on network storage protocolsover high-speed raw Ethernet
      (2015-10) González-Férez, Pilar; Bilas, Angelos; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      Current storage trends dictate placing fast storage devices in all servers and using them as a single distributed storage system. In this converged model where storage and compute resources co-exist in the same server, the role of the network is becoming more important: network overhead is becoming a main imitation to improving storage performance. In our previous work we have designed Tyche, a network protocol for converged storage that bundles multiple 10GigE links transparently and reduces protocol overheads over raw Ethernet without hardware support. However, current technology trends and server consolidation dictates building servers with large amounts of resources (CPU, memory, network, storage). Such servers need to employ Non-Uniform Memory Architectures (NUMA) to scale memory performance. NUMA introduces significant problems with the placement of data and buffers at all software levels. In this paper, we first use Tyche to examine the performance implications of NUMA servers on end-to-end network storage performance. Our results show that NUMA effects have significant negative impact and can reduce throughput by almost 2x on servers with as few as 8 cores (16 hyper-threads). Then, we propose extensions to network protocols that can mitigate this impact. We use information about the location of data, cores, and NICs to properly align data transfers and minimize the impact of NUMA servers. Our design almost entirely eliminates NUMA effects by encapsulating all protocol structures to a “channel” concept and then carefully mapping channels and their resources to NICs and NUMA nodes.
    • Publication
      Scheduler hierarchies to aid peta-scale cloud simulations with DISSECT-CF
      (2015-10) Kecskemeti, Gabor; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      IaaS cloud simulators are frequently used for evaluating new scheduling practices. Unfortunately, most of these simulators scarcely allow the evaluation of larger-scale cloud infrastructures (i.e., with physical machine counts over a few thousand). Thus, they are seldom applicable for evaluating infrastructures available in commercial cloud settings (e.g., users mostly do not wait for simulations to complete in such settings). DISSECT-CF was shown to be better scaling than several other simulators, but peta-scale infrastructures with often millions of CPU cores were out of scope for DISSECT-CF as well. This paper reveals a hierarchical scheduler extension of DISSECT-CF that not only allows its users to evaluate peta-scale infrastructure behaviour, but also opens possibilities for analysing new multi-cloud scheduling techniques. The paper then analyses the performance of the extended simulator through large-scale synthetic workloads and compares its performance to DISSECT-CF’s past behaviour. Based on the analysis, the paper concludes with recommended simulation setups that will allow the evaluation of new schedulers for peta-scale clouds in a timely fashion (e.g., within minutes).
    • Publication
      NekBone with Optimized OpenACC directives
      (2015-10) Gong, Jing; Markidis, Stefano; Schliephake, Michael; Laure, Erwin; Cebamanos, Luis; Hart, Alistair; Min, Misun; Fischer, Paul; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      Accelerators and, in particular, Graphics Processing Units (GPUs) have emerged as promising computing technologies which may be suitable for the future Exascale systems. Here, we present performance results of NekBone, a benchmark of the Nek5000 code, implemented with optimized OpenACC directives and GPUDirect communications. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. Results of an optimized NekBone version lead to 78 Gflops performance on a single node. In addition, a performance result of 609 Tflops has been reached on 16, 384 GPUs of the Titan supercomputer at Oak Ridge National Laboratory.
    • Publication
      Log File Analysis in Cloud with Apache Hadoop and Apache Spark
      (2015-10) Mavridis, Ilias; Karatza, Eleni; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      Log files are a very important set of data that can lead to useful information through proper analysis. Due to the high production rate and the number of devices and software that generate logs, the use of cloud services for log analysis is almost necessary. This paper reviews the cloud computational framework ApacheTM Hadoop R, highlights the differences and similarities between Hadoop MapReduce and Apache SparkTM and evaluates the performance of them. Log file analysis applications were developed in both frameworks and performed SQL-type queries in real Apache Web Server log files. Various measurements were taken for each application and query with different parameters in order to extract safe conclusions about the performance of the two frameworks.
    • Publication
      Nature-Inspired Algorithm for Solving NP-Complete Problems
      (2015-10) Hristov, Atanas; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      High-Performance Computing has become an essential tool in numerous natural sciences. The modern highperformance computing systems are composed of hundreds of thousands of computational nodes, as well as deep memory hierarchies and complex interconnect topologies. Existing high performance algorithms and tools already require courageous programming and optimization efforts to achieve high efficiency on current supercomputers. On the other hand, these efforts are platform-specific and non-portable. A core challenge while solving NP-complete problems is the need to process these data with highly effective algorithms and tools where the computational costs grow exponentially. This paper investigates the efficiency of Nature-Inspired optimization algorithm for solving NP-complete problems, based on Artificial Bee Colony (ABC) metaheuristic. Parallel version of the algorithm have been proposed based on the flat parallel programming model with message passing for communication between the computational nodes in the platform and parallel programming model with multithreading for communication between the cores inside the computational node. Parallel communications profiling is made and parallel performance parameters are evaluated on the basis of experimental results.
    • Publication
      Labeling connected components in binary images based on cellular automata
      (2015-10) Stamatovic, Biljana; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      This short paper introduce an algorithm for labeling connected components in n-dimensional binary images based on cellular automata, , n >= 2. Here is presented tree-dimensional binary images algorithm. The algorithm code was implemented in NetLogo programming environment. The algorithm is local and can be efficiently implemented on data-flow parallel platforms with an asymptotic complexity of O(L) on an L × L × L bynary image.
    • Publication
      On Autonomic HPC Clouds
      (2015-10) Petcu, Dana; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      The long tail of science using HPC facilities is looking nowadays to instant available HPC Clouds as a viable alternative to the long waiting queues of supercomputing centers. While the name of HPC Cloud is suggesting a Cloud service, the current HPC-as-a-Service is mainly an offer of bar metal, better named cluster-on-demand. The elasticity and virtualization benefits of the Clouds are not exploited by HPC-as-a-Service. In this paper we discuss how the HPC Cloud offer can be improved from a particular point of view, of automation. After a reminder of the characteristics of the Autonomic Cloud, we project the requirements and expectations to what we name Autonomic HPC Clouds. Finally, we point towards the expected results of the latest research and development activities related to the topics that were identified.
    • Publication
      Distributed Parallel Computing for Visual Cryptography Algorithms
      (2015-10) Ciegis, Raimondas; Starikovicius, Vadimas; Tumanova, Natalija; Ragulskis, Minvydas; Palivonaite, Rita; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      The recent activities to construct exascale and ultrascale distributed computational systems are opening a possibility to apply parallel and distributed computing techniques for applied problems which previously were considered as not solvable with the standard computational resources. In this paper we consider one global optimization problem where a set of feasible solutions is discrete and very large. There is no possibility to apply some apriori estimation techniques to exclude an essential part of these elements from the computational analysis, e.g. applying branch and bound type methods. Thus a full search is required in order to solve such global optimization problems. The considered problem describes visual cryptography algorithms. The main goal is to find optimal perfect gratings, which can guarantee high quality and security of the visual cryptography method. The full search parallel algorithm is based on master-slave paradigm. We present a library of C++ templates that allow the developer to implement parallel master-slave algorithms for his application without any parallel programming and knowledge of parallel programming API. These templates automatically give parallel solvers tailored for clusters of computers using MPI API and distributed computing applications using BOINC API. Results of some computational experiments are presented.
    • Publication
      A Scheduler for Cloud Bursting of Map-Intensive Traffic Analysis Jobs
      (2015-10) Morla, Ricardo; Gonçalves, Pedro; Barbosa, Jorge; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      Network traffic analysis is important for detecting intrusions and managing application traffic. Low cost, clusterbased traffic analysis solutions have been proposed for bulk processing of large blocks of traffic captures, scaling out the processing capability of a single network analysis node. Because of traffic intensity variations owing to the natural burstiness of network traffic, a network analysis cluster may have to be severely over-dimensioned to support 24/7 continuous packet block capture and processing. Bursting the analysis of some of the packet blocks to the cloud may attenuate the need for over-dimensioning the local cluster. In fact, existing solutions for network traffic analysis in the cloud are already providing the traditional benefits of cloud-based services to network traffic analysts and opening the door to cloud-based Elastic MapReduce-style traffic analysis solutions. In this paper we propose a scheduler of packet block network analysis jobs that chooses between sending the job to a local cluster versus sending it to a network analysis service on the cloud. We focus on map-intensive jobs such as string matching-based virus and malware detection. We present an architecture for an Hadoop-based network analysis solution including our scheduler, report on using this approach in a small cluster, and show scheduling performance results obtained through simulation. We achieve up to more than 50% reduction on the amount of network traffic we need to burst out using our scheduler compared to simple traffic threshold scheduler and full resource availability scheduler. Finally we discuss scaling out issues for our network analysis solution.
    • Publication
      Solution of Bi-objective Competitive Facility Location Problem Using Parallel Stochastic Search Algorithm
      (2015-10) Lancinskas, Algirdas; Martínez Ortigosa, Pilar; Zilinskas, Julius; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
    • Publication
      Parallel Processing For Gravity Inversion
      (2015-10) Frasheri, Neki; Carretero Pérez, Jesús; García Blas, Javier; Wyrzykowski, Roman; Jeannot, Emmanuel; Universidad Carlos III de Madrid. Computer Architecture, Communications and Systems Group (ARCOS)
      In this paper results of recent updates of a simple algorithm for the inversion of gravity anomalies for 3D geosections in parallel computer systems are presented. A relaxation iterative principle was used updating step by step the geosection distribution of mass density. Selection of updates was done on basis of least squares error match of the update effect with the observed anomaly. Locally weighted least squares combined with the linear trend were used to obtain good inversion results for two-body geosections.