DI - ATC - Artículos en Congresos Internacionales

Permanent URI for this collection

https://hdl.handle.net/10016/853

Browse

Now showing 1 - 10 of 10

Spark-DIY: A framework for interoperable Spark Operations with high performance Block-Based Data Models
(Ieee Computer Society, 2018-12-17) Caino Lores, Silvina; Carretero Pérez, Jesús; Nicolae, Bogdan; Yildiz, Orcun; Peterka, Tom; Ministerio de Economía, Industria y Competitividad (España); Ministerio de Educación, Cultura y Deporte (España)
A data integrity verification service for cloud storage based on building blocks
(IEEE, 2018-07-11) Reyes Anastacio, Hugo G.; Gonzalez-Compean, Jose Luis; Morales Sandoval, Miguel; Carretero Pérez, Jesús; European Commission
Cloud storage is a popular solution for organizations and users to store data in ubiquitous and cost-effective manner. However, violations of confidentiality and integrity are still issues associated to this technology. In this context, there is a need for tools that enable organizations/users to verify the integrity of their information stored in cloud services. In this paper, we present the design and implementation of an efficient service based on provable data possession cryptographic model, which enables organizations to verify, on-demand, the data integrity without retrieving files from the cloud. The storage and cryptographic components have been developed in the form of building blocks, which are deployed on the user-side using the Manager/Worker pattern that favors exploiting parallelism when executing data possession challenges. An experimental evaluation in a private cloud revealed the efficacy of launching integrity verification challenges to cloud storage services and the feasibility of applying containerized task parallel scheme that significantly improves the performance of the data possession proof service in real-world scenarios in comparison with the implementation of the original possession data proof scheme.
Exposing data locality in HPC-based systems by using the HDFS backend
(IEEE, 2020-12-16) Rivadeneira López-Bravo, José; García Carballeira, Félix; Carretero Pérez, Jesús; García Blas, Francisco Javier; Comunidad de Madrid; European Commission
A containerized service for clustering and categorization of weather records in the cloud
(IEEE, 2018-07-11) Sánchez Gallegos, Dante D.; González Compean, J.L.; Alvarado Barrientos, Susana; Sosa Sosa, Victor Jesus; Vargas Tuxpan, Jose; Carretero Pérez, Jesús
This paper presents a containerized service for clustering and categorization of weather records in the cloud. This service considers a scheme of microservices and containers for organizations and end-users to manage/process weather records from the acquisition, passing through the prepossessing and processing stages, to the exhibition of results. In this service, a specialized crawler acquires records that are delivered to a microservice of distributed categorization of weather records, which performs clustering of acquired data (the temperature and precipitation) by spatiotemporal parameters. The clusters found are exhibited in a map by a geoportal where statistic microservice also produce results regression graphs on-the-fly. To evaluate the feasibility of this service, a case study based on 33 years of daily records captured by the Mexican weather station network (EMAS-CONAGUA) has been conducted. Lessons learned in this study about the performance of record acquisition, clustering processing, and mapping exhibition are described in this paper. Examples of utilization of this service revealed that end-users can analyze weather parameters in an efficient, flexible and automatic manner.
Mapping and scheduling HPC applications for optimizing I/O
(Association For Computing Machinery (Acm), 2020-06-29) Carretero Pérez, Jesús; Jeannot, Emmanuel; Pallez, Guillaume; Expósito Singh, David; Vidal, Nicolas
In HPC platforms, concurrent applications are sharing the same file system. This can lead to conflicts, especially as applications are more and more data intensive. I/O contention can represent a performance bottleneck. The access to bandwidth can be split in two complementary yet distinct problems. The mapping problem and the scheduling problem. The mapping problem consists in selecting the set of applications that are in competition for the I/O resource. The scheduling problem consists then, given I/O requests on the same resource, in determining the order to these accesses to minimize the I/O time. In this work we propose to couple a novel bandwidth-aware mapping algorithm to I/O list-scheduling policies to develop a cross-layer optimization solution. We study this solution experimentally using an I/O middleware: CLARISSE. We show that naive policies such as FIFO perform relatively well in order to schedule I/O movements, and that the important part to reduce congestion lies mostly on the mapping part. We evaluate the algorithm that we propose using a simulator that we validated experimentally. This evaluation shows important gains for the simple, bandwidth-aware mapping solution that we provide compared to its non bandwidth-aware counterpart. The gains are both in terms of machine efficiency (makespan) and application efficiency (stretch). This stresses even more the importance of designing efficient, bandwidth-aware mapping strategies to alleviate the cost of I/O congestion.
LIMITLESS - Light-weight monitoring tool for large scale systems
(IEEE, 2021-03-10) Cascajo García, Alberto; Expósito Singh, David; Carretero Pérez, Jesús; European Commission
A federated content distribution system to build health data synchronization services
(IEEE, 2021-03-10) Carrizales Espinoza, Diana; Sánchez Gallegos, Dante D.; González Compean, J.L.; Carretero Pérez, Jesús; Comunidad de Madrid; Agencia Estatal de Investigación (España)
In organizational environments, such as in hospitals, data have to be processed, preserved, and shared with other organizations in a cost-efficient manner. Moreover, organizations have to accomplish different mandatory non-functional requirements imposed by the laws, protocols, and norms of each country. In this context, this paper presents a Federated Content Distribution System to build infrastructure-agnostic health data synchronization services. In this federation, each hospital manages local and federated services based on a pub/sub model. The local services manage users and contents (i.e., medical imagery) inside the hospital, whereas federated services allow the cooperation of different hospitals sharing resources and data. Data preparation schemes were implemented to add non-functional requirements to data. Moreover, data published in the content distribution system are automatically synchronized to all users subscribed to the catalog where the content was published.
A data preparation approach for cloud storage based on containerized parallel patterns
(Springer, 2019-10-10) Carrizales, Diana; Sánchez Gallegos, Dante D.; Reyes, Hugo; González Compean, J.L.; Morales Sandoval, Miguel; Carretero Pérez, Jesús; Galaviz Mosqueda, Alejandro
In this paper, we present the design, implementation, and evaluation of an efficient data preparation and retrieval approach for cloud storage. The approach includes a deduplication subsystem that indexes the hash of each content to identify duplicated data. As a consequence, avoiding duplicated content reduces reprocessing time during uploads and other costs related to outsource data management tasks. Our proposed data preparation scheme enables organizations to add properties such as security, reliability, and cost-efficiency to their contents before sending them to the cloud. It also creates recovery schemes for organizations to share preprocessed contents with partners and end-users. The approach also includes an engine that encapsulates preprocessing applications into virtual containers (VCs) to create parallel patterns that improve the efficiency of data preparation retrieval process. In a study case, real repositories of satellite images, and organizational files were prepared to be migrated to the cloud by using processes such as compression, encryption, encoding for fault tolerance, and access control. The experimental evaluation revealed the feasibility of using a data preparation approach for organizations to mitigate risks that still could arise in the cloud. It also revealed the efficiency of the deduplication process to reduce data preparation tasks and the efficacy of parallel patterns to improve the end-user service experience.
Adaptive application deployment of priority services in virtual environments
(Springer, 2019-10-10) Carretero Pérez, Jesús; Vasile Cabezas, Mario; Sosa Sosa, Victor Jesus; Ministerio de Economía, Industria y Competitividad (España)
This paper introduces an adaptive application deployment service for virtualized environments (named DECIDE). This service facilitates the definition of customized cluster/cloud environment and the adaptive integration of scheduling policies for testing and deploying containerized applications. The service-based design of DECIDE and the use of a virtualized environment makes it possible to easily change the cluster/cloud configuration and its scheduling policy. It provides a differentiated service for application deployment based on priorities, according to user requirements. A prototype of this service was implemented using Apache MESOS and Docker. As a proof of concept, a federated application for electronic identification (eIDAS) was deployed using the DECIDE approach, which allows users to evaluate different deployment scenarios and scheduling policies providing useful information for decision making. Experiments were carried out to validate service functionality and the feasibility for testing and deploying applications that require different scheduling policies.
A Highly Available Cluster of Web Servers with Increased Storage Capacity
(Universidad de Castilla-La Mancha, 2006-09) García Sánchez, José Daniel; Carretero Pérez, Jesús; García Carballeira, Félix; Singh, David E.; Fernández Muñoz, Javier
Web servers scalability has been traditionally solved by improving software elements or increasing hardware resources of the server machine. Another approach has been the usage of distributed architectures. In such architectures, usually, file al- location strategy has been either full replication or full distribution. In previous works we have showed that partial replication offers a good balance between storage capacity and reliability. It offers much higher storage capacity while reliability may be kept at an equivalent level of that from fully replicated solutions. In this paper we present the architectural details of Web cluster solutions adapted to partial replication. We also show that partial replication does not imply a penalty in performance over classical fully replicated architectures. For evaluation purposes we have used a simulation model under the OMNeT++ framework and we use mean service time as a performance comparison metric.

Browse

Recent Submissions