Hierarchical clustering of bipartite data sets based on the statistical significance of coincidences

e-Archivo Repository

Show simple item record

dc.contributor.author Tamarit, Ignacio
dc.contributor.author Pereda, María
dc.contributor.author Cuesta, José A.
dc.date.accessioned 2021-02-15T15:48:26Z
dc.date.available 2021-02-15T15:48:26Z
dc.date.issued 2020-10-06
dc.identifier.bibliographicCitation Physical review E, 102(4), 042304, Oct. 2020, 12pp.
dc.identifier.issn 2470-0045
dc.identifier.issn 2470-0053 (online)
dc.identifier.uri http://hdl.handle.net/10016/31930
dc.description.abstract When some 'entities' are related by the 'features' they share they are amenable to a bipartite network representation. Plant-pollinator ecological communities, co-authorship of scientific papers, customers and purchases, or answers in a poll, are but a few examples. Analyzing clustering of such entities in the network is a useful tool with applications in many fields, like internet technology, recommender systems, or detection of diseases. The algorithms most widely applied to find clusters in bipartite networks are variants of modularity optimization. Here, we provide a hierarchical clustering algorithm based on a dissimilarity between entities that quantifies the probability that the features shared by two entities are due to mere chance. The algorithm performance is O(n2) when applied to a set of n entities, and its outcome is a dendrogram exhibiting the connections of those entities. Through the introduction of a 'susceptibility' measure we can provide an 'optimal' choice for the clustering as well as quantify its quality. The dendrogram reveals further useful structural information though -like the existence of subclusters within clusters or of nodes that do not fit in any cluster. We illustrate the algorithm by applying it first to a set of synthetic networks, and then to a selection of examples. We also illustrate how to transform our algorithm into a valid alternative for one-mode networks as well, and show that it performs at least as well as the standard, modularity-based algorithms- with a higher numerical performance. We provide an implementation of the algorithm in python freely accessible from GitHub.
dc.description.sponsorship This research has been funded by the Spanish Ministerio de Ciencia e, Innovación FEDER funds of the European Union support, under project PGC2018-098186-B-I00.
dc.format.extent 12
dc.language.iso eng
dc.publisher American Physical Society (APS)
dc.rights © 2020 American Physical Society
dc.subject.other Community detection
dc.subject.other Graph clustering
dc.subject.other Modularity
dc.title Hierarchical clustering of bipartite data sets based on the statistical significance of coincidences
dc.type article
dc.subject.eciencia Matemáticas
dc.identifier.doi https://doi.org/10.1103/PhysRevE.102.042304
dc.rights.accessRights openAccess
dc.relation.projectID Gobierno de España. PGC2018-098186-B-I00
dc.type.version publishedVersion
dc.identifier.publicationfirstpage 1
dc.identifier.publicationissue 4, 042304
dc.identifier.publicationlastpage 12
dc.identifier.publicationtitle PHYSICAL REVIEW E
dc.identifier.publicationvolume 102
dc.identifier.uxxi AR/0000026259
dc.contributor.funder Ministerio de Ciencia e Innovación (España)
dc.affiliation.dpto UC3M. Departamento de Matemáticas
dc.affiliation.grupoinv UC3M. Grupo de Investigación: Interdisciplinar de Sistemas Complejos (GISC)
 Find Full text

Files in this item

*Click on file's image for preview. (Embargoed files's preview is not supported)


The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record