RT Dissertation/Thesis
T1 Breadth analysis of Online Social Networks
A1 Chiroque Núñez, Luis Felipe
AB This thesis is mainly motivated by the analysis, understanding, and prediction of human behaviourby means of the study of their digital fingeprints. Unlike a classical PhD thesis, whereyou choose a topic and go further on a deep analysis on a research topic, we carried out a breadthanalysis on the research topic of complex networks, such as those that humans create themselveswith their relationships and interactions. These kinds of digital communities where humans interactand create relationships are commonly called Online Social Networks. Then, (i) we havecollected their interactions, as text messages they share among each other, in order to analyze thesentiment and topic of such messages. We have basically applied the state-of-the-art techniquesfor Natural Language Processing, widely developed and tested on English texts, in a collectionof Spanish Tweets and we compare the results. Next, (ii) we focused on Topic Detection, creatingour own classifier and applying it to the former Tweets dataset. The breakthroughs are two:our classifier relies on text-graphs from the input text and we achieved a figure of 70% accuracy,outperforming previous results. After that, (iii) we moved to analyze the network structure (ortopology) and their data values to detect outliers. We hypothesize that in social networks thereis a large mass of users that behaves similarly, while a reduced set of them behave in a differentway. However, specially among this last group, we try to separate those with high activity, orlow activity, or any other paramater/feature that make them belong to different kind of outliers.We aim to detect influential users in one of these outliers set. We propose a new unsupervisedmethod, Massive Unsupervised Outlier Detection (MUOD), labeling the outliers detected os ofshape, magnitude, amplitude or combination of those. We applied this method to a subset ofroughly 400 million Google+ users, identifying and discriminating automatically sets of outlierusers. Finally, (iv) we find interesting to address the monitorization of real complex networks.We created a framework to dynamically adapt the temporality of large-scale dynamic networks,reducing compute overhead by at least 76%, data volume by 60% and overall cloud costs by atleast 54%, while always maintaining accuracy above 88%.
YR 2021
FD 2021-09
LK https://hdl.handle.net/10016/34753
UL https://hdl.handle.net/10016/34753
LA eng
DS e-Archivo
RD 22 may. 2024