RT Dissertation/Thesis
T1 High-performance and fault-tolerant techniques for massive data distribution in online communities
A1 Higuero Alonso-Mardones, Daniel
AB The amount of digital information produced and consumed is increasing each day.This rapid growth is motivated by the advances in computing power, hardware technologies,and the popularization of user generated content networks. New hardwareis able to process larger quantities of data, which permits to obtain finer results, andas a consequence more data is generated. In this respect, scientific applications haveevolved benefiting from the new hardware capabilities. This type of application ischaracterized by requiring large amounts of information as input, generating a significantamount of intermediate data resulting in large files. This increase not onlyappears in terms of volume, but also in terms of size, we need to provide methodsthat permit a efficient and reliable data access mechanism. Producing such a methodis a challenging task due to the amount of aspects involved. However, we can leveragethe knowledge found in social networks to improve the distribution process. Inthis respect, the advent of the Web 2.0 has popularized the concept of social network,which provides valuable knowledge about the relationships among users, andthe users with the data. However, extracting the knowledge and defining ways toactively use it to increase the performance of a system remains an open researchdirection.Additionally, we must also take into account other existing limitations. In particular,the interconnection between different elements of the system is one of the keyaspects. The availability of new technologies such as the mass-production of multicorechips, large storage media, better sensors, etc. contributed to the increase ofdata being produced. However, the underlying interconnection technologies havenot improved with the same speed as the others. This leads to a situation wherevast amounts of data can be produced and need to be consumed by a large numberof geographically distributed users, but the interconnection between both ends doesnot match the required needs.In this thesis, we address the problem of efficient and reliable data distribution ina geographically distributed systems. In this respect, we focus on providing a solutionthat 1) optimizes the use of existing resources, 2) does not requires changes inthe underlying interconnection, and 3) provides fault-tolerant capabilities. In orderto achieve this objectives, we define a generic data distribution architecture composedof three main components: community detection module, transfer schedulingmodule, and distribution controller. The community detection module leverages theinformation found in the social network formed by the users requesting files andproduces a set of virtual communities grouping entities with similar interests. Thetransfer scheduling module permits to produce a plan to efficiently distribute all requestedfiles improving resource utilization. For this purpose, we model the distributionproblem using linear programming and offer a method to permit a distributedsolving of the problem. Finally, the distribution controller manages the distributionprocess using the aforementioned schedule, controls the available server infrastructure,and launches new on-demand resources when necessary.
YR 2013
FD 2013-06
LK https://hdl.handle.net/10016/18074
UL https://hdl.handle.net/10016/18074
LA eng
DS e-Archivo
RD 18 jul. 2024