Canonical covariance analysis (CCA) has gained popularity as a method for the analysis of two sets of high-dimensional genomic data. However, it is often difficult to interpret the results because canonical vectors are linear combinations of all variables, and the coefficients are typically nonzero. Several sparse CCA methods have recently been proposed for reducing the number of nonzero coefficients, but these existing methods are not satisfactory because they still give too many nonzero coefficients. In this paper, we propose a new random-effect model approach for sparse CCA; the proposed algorithm can adapt arbitrary penalty functions to CCA without much computational demands. Through simulation studies, we compare various penalty functions in terms of the performance of correct model identification. We also develop an extension of sparse CCA to address more than two sets of variables on the same set of observations. We illustrate the method with an analysis of the NCI cancer dataset.
|Journal||Statistical Applications in Genetics and Molecular Biology|
|State||Published - 2011|
Bibliographical noteFunding Information:
KEYWORDS: canonical covariance analysis, sparsity, random-effect model, high-dimensional genomic data Author Notes: Woojoo Lee, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. Donghwan Lee, Department of Statistics, College of Natural Sciences, Seoul National University, NS40, San56-1, Shin Lim-Dong, Kwan-Ak-Ku, Seoul, Korea. Youngjo Lee, Department of Statistics, College of Natural Sciences, Seoul National University, NS40, San56-1, Shin Lim-Dong, Kwan-Ak-Ku, Seoul, Korea. Yudi Pawitan, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. This research is supported by grants from the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2010-0011372) and the European Union under the Chemores project and the Swedish Science Council.
- canonical covariance analysis
- high-dimensional genomic data
- random-effect model