Abstract
Clustering algorithms are used in the analysis of gene expression data to identify groups of genes with similar expression patterns. These algorithms group genes with respect to a predefined dissimilarity measure without using any prior classification of the data. Most of the clustering algorithms require the number of clusters as input, and all the objects in the dataset are usually assigned to one of the clusters. We propose a clustering algorithm that finds clusters sequentially, and allows for sporadic objects, so there are objects that are not assigned to any cluster. The proposed sequential clustering algorithm has two steps. First it finds candidates for centers of clusters. Multiple candidates are used to make the search for clusters more efficient. Secondly, it conducts a local search around the candidate centers to find the set of objects that defines a cluster. The candidate clusters are compared using a predefined score, the best cluster is removed from data, and the procedure is repeated. We investigate the performance of this algorithm using simulated data and we apply this method to analyze gene expression profiles in a study on the plasticity of the dendritic cells.
Original language | English |
---|---|
Pages (from-to) | 175-184 |
Number of pages | 10 |
Journal | Journal of the Korean Statistical Society |
Volume | 38 |
Issue number | 2 |
DOIs | |
State | Published - Jun 2009 |
Bibliographical note
Funding Information:This research was supported in part by Burroughs Wellcome Fund Interfaces grant 1001774 (Song) and by The National Science Foundation grant DMS-0072510 (Nicolae).
Keywords
- 62L12
- 91C20
- Clustering algorithm
- Clustering score
- Microarrays
- Sequential clustering
- primary
- secondary