Publication: Estudio de gestión de sistemas basados en grafos
Loading...
Identifiers
Publication date
2015-10
Defense date
2015-10-06
Authors
Tutors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Desde hace unos años, tienen un auge cada vez más fuerte los sistemas de gestión de
bases de datos NoSQL. Este incremento en el uso de estas herramientas viene dado por el
constante aumento del número de datos que las organizaciones deben manejar a diario y por
los nuevos desafíos que propone Internet. Estos sistemas ofrecen la oportunidad de manejar
grandes volúmenes de datos (lo que se conoce como Big Data) y afrontan los desafíos que
propone internet mejor que los gestores de bases de datos tradicionales.
Existen distintos tipos de bases de datos NoSQL, como las bases de datos basadas en
documentos, en grafos, las que almacenan la información en columnas y las que almacenan
clave\valor.
El presente Trabajo de Fin de Grado se enfoca en las bases de datos basadas en grafos.
Se pretende realizar un análisis de los principales sistemas gestores de este tipo de bases de
datos repasando las características más importantes de cada uno de ellos. Además, este
trabajo tiene como objetivo realizar una comparativa entre el modelo RSHP y Neo4j, para
comprobar cuál de los dos sistemas recupera mejor la información relevante a términos
concretos.
Para todo ello, en primer lugar se analizan las principales características de sistemas
como Neo4j, KnowledgeMANAGER, Sparksee, GraphBase, Infinite Graph, entre algunos otros.
Se presta mayor atención a Neo4j y KnowledgeMANAGER, ya que son los dos sistemas
utilizados para realizar la comparativa.
Después de analizar distintas bases de datos orientadas a grafos, se detallan diferentes
formas de almacenar la información en el grafo de Neo4j, comparándolas entre sí y
escogiendo la más adecuada para la comparativa posterior. Asimismo, otro aspecto
importante de este trabajo es la creación de un proceso automatizado que sea capaz de
generar consultas de forma aleatoria y de ejecutarlas de forma programática en Neo4j. A
través de este proceso, se obtienen las consultas en lenguaje natural para ejecutar en RSHP y
las consultas en Cypher junto con los resultados obtenidos de la ejecución en Neo4j.
Por último, después de ejecutar las consultas en ambos sistemas, se identifican los
resultados relevantes para cada una de ellas, comprobando cuál de los dos recupera mejor
datos relevantes. Para analizar esos resultados, se calculan las medidas de precisión y recall
para cada consulta.
Some years ago, new types of databases called NoSQL began to appear. In recent years, these databases are increasing due to the high volume of data that organizations must handle everyday. These databases offer the opportunity to handle large volumes of data (this is known as Big Data), facing Internet’s challenges better than relational databases did before. There are different types of NoSQL databases, such as databases based on documents, graphs, databases which store information in columns and others that store information in key\value. This final degree project is focused on databases based on graphs. It aims to make an analysis of the main management systems for this kind of databases reviewing the most important features of each one. In addition, this work aims to make a comparison between the RSHP model and Neo4j, to see which of the two systems recovers the relevant information, on specific terms, better. In order to perform this work, firstly, the main features of systems such as Neo4j, KnowledgeMANAGER, Sparksee, GraphBase, Infinite Graph, among others, are analysed. Neo4j and KnowledgeMANAGER are studied carefully because they are used to perform the comparison. After analysing different databases based on graphs, different ways of storing information in the Neo4j graph are compared. Then, the most suitable alternative is chosen for subsequent comparative analysis. Also, another important point of this project is to create an automated process that is able to generate random queries and run them in Neo4j. Through this process, queries in Cypher, execution results in Neo4j and queries in natural language to execute in RSHP, are obtained. Finally, after executing queries in both systems, relevant results for every query are identified. Then, it is checked whether a system gets better relevant data than the other one. To analyse these results, measures of precision and recall are calculated for each query.
Some years ago, new types of databases called NoSQL began to appear. In recent years, these databases are increasing due to the high volume of data that organizations must handle everyday. These databases offer the opportunity to handle large volumes of data (this is known as Big Data), facing Internet’s challenges better than relational databases did before. There are different types of NoSQL databases, such as databases based on documents, graphs, databases which store information in columns and others that store information in key\value. This final degree project is focused on databases based on graphs. It aims to make an analysis of the main management systems for this kind of databases reviewing the most important features of each one. In addition, this work aims to make a comparison between the RSHP model and Neo4j, to see which of the two systems recovers the relevant information, on specific terms, better. In order to perform this work, firstly, the main features of systems such as Neo4j, KnowledgeMANAGER, Sparksee, GraphBase, Infinite Graph, among others, are analysed. Neo4j and KnowledgeMANAGER are studied carefully because they are used to perform the comparison. After analysing different databases based on graphs, different ways of storing information in the Neo4j graph are compared. Then, the most suitable alternative is chosen for subsequent comparative analysis. Also, another important point of this project is to create an automated process that is able to generate random queries and run them in Neo4j. Through this process, queries in Cypher, execution results in Neo4j and queries in natural language to execute in RSHP, are obtained. Finally, after executing queries in both systems, relevant results for every query are identified. Then, it is checked whether a system gets better relevant data than the other one. To analyse these results, measures of precision and recall are calculated for each query.
Description
Keywords
Bases de datos, Teoría de grafos, Tecnología de la información, Neo4j, RSHP, KnowledgeMANAGER