TY - JOUR
T1 - GEMiCCL
T2 - Mining genotype and expression data of cancer cell lines with elaborate visualization
AU - Jeong, Inhae
AU - Yu, Namhee
AU - Jang, Insu
AU - Jun, Yukyung
AU - Kim, Min Seo
AU - Choi, Jinhyuk
AU - Lee, Byungwook
AU - Lee, Sanghyuk
N1 - Publisher Copyright:
© The Author(s) 2018. Published by Oxford University Press.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Cancer cell lines are essential components for biomedical research. However, proper choice of cell lines for experimental purposes is often difficult because genotype and/or expression data are missing or scattered in diverse resources. Here, we report Gene Expression and Mutations in Cancer Cell Lines (GEMiCCL), an online database of human cancer cell lines that provides genotype and expression information. We have collected mutation, gene expression and copy number variation (CNV) data from three representative databases on cell lines - Cancer Cell Line Encyclopedia, Catalogue of Somatic Mutations in Cancer and NCI60. In total, GEMiCCL includes 1406 cell lines from 185 cancer types and 29 tissues. Gene expression, mutation and CNV information are available for 1304, 1334 and 1365 cell lines, respectively. We removed batch effects due to different microarray platforms using the ComBat software and re-processed the entire gene expression and SNP chip data. Cell line names and clinical information were standardized using Cellosaurus from ExPASy. Our user interface supports cell line search, gene search, browsing for specific molecular characteristics and complex queries-based on Boolean logic rules. We also implemented many interactive features and user-friendly visualizations. Providing molecular characteristics and clinical information, we believe that GEMiCCL would be a valuable resource for biomedical research for functional or screening studies.
AB - Cancer cell lines are essential components for biomedical research. However, proper choice of cell lines for experimental purposes is often difficult because genotype and/or expression data are missing or scattered in diverse resources. Here, we report Gene Expression and Mutations in Cancer Cell Lines (GEMiCCL), an online database of human cancer cell lines that provides genotype and expression information. We have collected mutation, gene expression and copy number variation (CNV) data from three representative databases on cell lines - Cancer Cell Line Encyclopedia, Catalogue of Somatic Mutations in Cancer and NCI60. In total, GEMiCCL includes 1406 cell lines from 185 cancer types and 29 tissues. Gene expression, mutation and CNV information are available for 1304, 1334 and 1365 cell lines, respectively. We removed batch effects due to different microarray platforms using the ComBat software and re-processed the entire gene expression and SNP chip data. Cell line names and clinical information were standardized using Cellosaurus from ExPASy. Our user interface supports cell line search, gene search, browsing for specific molecular characteristics and complex queries-based on Boolean logic rules. We also implemented many interactive features and user-friendly visualizations. Providing molecular characteristics and clinical information, we believe that GEMiCCL would be a valuable resource for biomedical research for functional or screening studies.
UR - http://www.scopus.com/inward/record.url?scp=85054777509&partnerID=8YFLogxK
U2 - 10.1093/database/bay041
DO - 10.1093/database/bay041
M3 - Article
C2 - 29726944
AN - SCOPUS:85054777509
SN - 1758-0463
VL - 2018
JO - Database
JF - Database
IS - 2018
ER -