Heterogeneity and model uncertainty in bayesian regression models

Thumbnail Image
Publication date
Defense date
Journal Title
Journal ISSN
Volume Title
Google Scholar
Research Projects
Organizational Units
Journal Issue
Data heterogeneity appears when the sample comes from at least two different populations. We analyze three types of situations. The first and simplest case corresponds to the situation in which the majority of the data comes form a central model and a few isolated observations comes from a contaminating distribution. Then the data from the contaminating distribution are called outliers and they have been studied in depth in the statistical literature. The second case corresponds to the situation in which we still have a central model but the heterogeneous data may appears in clusters of outliers which mask each other. This is the multiple outlier problem which is much more difficult to handle and it has understood and analyzed in the last few years. The few Bayesian contributions to this problem are presented. The third case corresponds to the situation in which we do not have a central model but instead different groups of data have been generated by different models. When the data is multivariate normal, this problem has been analyzed by mixture models under the name of cluster analysis but a challenging area of research is to develop a general methodology to apply this multiple model approach to other statistical problems. Heterogeneity implies in general an increase in the uncertainty of predictions, and in this paper a procedure to measure this effect is proposed.
Cluster analysis, influential data, masking, mixture model, outliers, predictive distributions, robust estimation
Bibliographic citation