Abstract
A Bayesian model-based clustering method is proposed for clustering objects on the basis of dissimilarites. This combines two basic ideas. The first is that the objects have latent positions in a Euclidean space, and that the observed dissimilarities are measurements of the Euclidean distances with error. The second idea is that the latent positions are generated from a mixture of multivariate normal distributions, each one corresponding to a cluster. We estimate the resulting model in a Bayesian way using Markov chain Monte Carlo. The method carries out multidimensional scaling and model-based clustering simultaneously, and yields good object configurations and good clustering results with reasonable measures of clustering uncertainties. In the examples we study, the clustering results based on low-dimensional configurations were almost as good as those based on high-dimensional ones. Thus, the method can be used as a tool for dimension reduction when clustering high-dimensional objects, which may be useful especially for visual inspection of clusters. We also propose a Bayesian criterion for choosing the dimension of the object configuration and the number of clusters simultaneously. This is easy to compute and works reasonably well in simulations and real examples.
| Original language | English |
|---|---|
| Pages (from-to) | 559-585 |
| Number of pages | 27 |
| Journal | Journal of Computational and Graphical Statistics |
| Volume | 16 |
| Issue number | 3 |
| DOIs | |
| State | Published - Sep 2007 |
Keywords
- Hierarchical model
- Markov chain Monte Carlo
- Mixture models
- Multidimensional scaling