TY - JOUR
T1 - MLACP
T2 - Machine-learning-based prediction of anticancer peptides
AU - Manavalan, Balachandran
AU - Basith, Shaherin
AU - Shin, Tae Hwan
AU - Choi, Sun
AU - Kim, Myeong Ok
AU - Lee, Gwang
N1 - Funding Information:
This work was supported by the Basic Science Research Program through the National Research Foundation (NRF) of Korea funded by the Ministry of Education, Science and Technology (2015R1D1A1A09060192), Priority Research Centers Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2009-0093826), Mid-Career Researcher Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (2017R1A2B4010084) (to S. Choi) and the Brain Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2016M3C7A1904392). The authors would like to thank Dr. Sathiyamoorthy Subramaniyam for his assistance in web server development.
Publisher Copyright:
© Manavalan et al.
PY - 2017
Y1 - 2017
N2 - Cancer is the second leading cause of death globally, and use of therapeutic peptides to target and kill cancer cells has received considerable attention in recent years. Identification of anticancer peptides (ACPs) through wet-lab experimentation is expensive and often time consuming; therefore, development of an efficient computational method is essential to identify potential ACP candidates prior to in vitro experimentation. In this study, we developed support vector machine- and random forest-based machine-learning methods for the prediction of ACPs using the features calculated from the amino acid sequence, including amino acid composition, dipeptide composition, atomic composition, and physicochemical properties. We trained our methods using the Tyagi-B dataset and determined the machine parameters by 10-fold cross-validation. Furthermore, we evaluated the performance of our methods on two benchmarking datasets, with our results showing that the random forest-based method outperformed the existing methods with an average accuracy and Matthews correlation coefficient value of 88.7% and 0.78, respectively. To assist the scientific community, we also developed a publicly accessible web server at www.thegleelab. org/MLACP.html.
AB - Cancer is the second leading cause of death globally, and use of therapeutic peptides to target and kill cancer cells has received considerable attention in recent years. Identification of anticancer peptides (ACPs) through wet-lab experimentation is expensive and often time consuming; therefore, development of an efficient computational method is essential to identify potential ACP candidates prior to in vitro experimentation. In this study, we developed support vector machine- and random forest-based machine-learning methods for the prediction of ACPs using the features calculated from the amino acid sequence, including amino acid composition, dipeptide composition, atomic composition, and physicochemical properties. We trained our methods using the Tyagi-B dataset and determined the machine parameters by 10-fold cross-validation. Furthermore, we evaluated the performance of our methods on two benchmarking datasets, with our results showing that the random forest-based method outperformed the existing methods with an average accuracy and Matthews correlation coefficient value of 88.7% and 0.78, respectively. To assist the scientific community, we also developed a publicly accessible web server at www.thegleelab. org/MLACP.html.
KW - Anticancer peptides
KW - Hybrid model
KW - Machine-learning parameters
KW - Random forest
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85030318795&partnerID=8YFLogxK
U2 - 10.18632/oncotarget.20365
DO - 10.18632/oncotarget.20365
M3 - Article
C2 - 29100375
AN - SCOPUS:85030318795
SN - 1949-2553
VL - 8
SP - 77121
EP - 77136
JO - Oncotarget
JF - Oncotarget
IS - 44
ER -