Grupo de Tratamiento de Señal y Aprendizaje
http://hdl.handle.net/10016/9041
2016-12-05T14:30:02ZBayesian Nonparametric Crowdsourcing
http://hdl.handle.net/10016/23918
Bayesian Nonparametric Crowdsourcing
García Moreno, Pablo; Artés-Rodríguez, Antonio; Teh, Yee Whye; Pérez Cruz, Fernando
Crowdsourcing has been proven to be an effective and efficient tool to annotate large data-sets. User annotations are often noisy, so methods to combine the annotations to produce reliable estimates of the ground truth are necessary. We claim that considering the existence of clusters of users in this combination step can improve the performance. This is especially important in early stages of crowdsourcing implementations, where the number of annotations is low. At this stage there is not enough information to accurately estimate the bias introduced by each annotator separately, so we have to resort to models that consider the statistical links among them. In addition, finding these clusters is interesting in itself as knowing the behavior of the pool of annotators allows implementing efficient active learning strategies. Based on this, we propose in this paper two new fully unsupervised models based on a Chinese restaurant process (CRP) prior and a hierarchical structure that allows inferring these groups jointly with the ground truth and the properties of the users. Efficient inference algorithms based on Gibbs sampling with auxiliary variables are proposed. Finally, we perform experiments, both on synthetic and real databases, to show the advantages of our models over state-of-the-art algorithms.
2015-08-01T00:00:00ZBayesian nonparametric comorbidity analysis of psychiatric disorders
http://hdl.handle.net/10016/23916
Bayesian nonparametric comorbidity analysis of psychiatric disorders
Rodríguez Ruiz, Francisco Jesús; Valera Martínez, María Isabel; Blanco, Carlos; Pérez Cruz, Fernando
The analysis of comorbidity is an open and complex research Field in the branch of psychiatry, where clinical experience and several studies suggest that the relation among the psychiatric disorders may have etiological and treatment implications. In this paper, we are interested in applying latent feature modeling to Find the latent structure behind the psychiatric disorders that can help to examine and explain the relationships among them. To this end, we use the large amount of information collected in the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database and propose to model these data using a nonparametric latent model based on the Indian BuFiet Process (IBP). Due to the discrete nature of the data, we First need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an eFicient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. We also provide a variational inference algorithm for this model, which provides a complementary (and less expensive in terms of computational complexity) alternative to the Gibbs sampler allowing us to deal with a larger number of data. Finally, we use the model to analyze comorbidity among the psychiatric disorders diagnosed by experts from the NESARC database.
2014-04-01T00:00:00ZSupporting scientific knowledge discovery with extended, generalized Formal Concept Analysis
http://hdl.handle.net/10016/23662
Supporting scientific knowledge discovery with extended, generalized Formal Concept Analysis
Valverde Albacete, Francisco José; González Calabozo, Jose María; Peñas, Anselmo; Peláez Moreno, Carmen
In this paper we fuse together the Landscapes of Knowledge of Wille's and Exploratory Data Analysis by leveraging Formal Concept Analysis (FCA) to support data-induced scientific enquiry and discovery. We use extended FCA first by allowing K-valued entries in the incidence to accommodate other, non-binary types of data, and second with different modes of creating formal concepts to accommodate diverse conceptualizing phenomena. With these extensions we demonstrate the versatility of the Landscapes of Knowledge metaphor to help in creating new scientific and engineering knowledge by providing several successful use cases of our techniques that support scientific hypothesis-making and discovery in a range of domains: semiring theory, perceptual studies, natural language semantics, and gene expression data analysis. While doing so, we also capture the affordances that justify the use of FCA and its extensions in scientific discovery.
2016-02-01T00:00:00ZTwo adaptive rejection sampling schemes for probability density functions log-convex tails
http://hdl.handle.net/10016/17200
Two adaptive rejection sampling schemes for probability density functions log-convex tails
Martino, Luca; Míguez Arenas, Joaquín
Monte Carlo methods are often necessary for the implementation of optimal Bayesian estimators. A fundamental technique that can be used to generate samples from virtually any target probability distribution is the so-called rejection sampling method, which generates candidate samples from a proposal distribution and then accepts them or not by testing the ratio of the target and proposal densities. The class of adaptive rejection sampling (ARS) algorithms is particularly interesting because they can achieve high acceptance rates. However, the standard ARS method can only be used with log-concave target densities. For this reason, many generalizations have been proposed. In this work, we investigate two different adaptive schemes that can be used to draw exactly from a large family of univariate probability density functions (pdf's), not necessarily log-concave, possibly multimodal and with tails of arbitrary concavity. These techniques are adaptive in the sense that every time a candidate sample is rejected, the acceptance rate is improved. The two proposed algorithms can work properly when the target pdf is multimodal, with first and second derivatives analytically intractable, and when the tails are log-convex in a infinite domain. Therefore, they can be applied in a number of scenarios in which the other generalizations of the standard ARS fail. Two illustrative numerical examples are shown.
Documento depositado en el repositorio arXiv.org. Versión: arXiv:1111.4942v1 [stat.CO]
2011-11-21T00:00:00Z