Comprehensive ensemble in QSAR prediction for drug discovery

Sunyoung Kwon, Ho Bae, Jeonghee Jo, Sungroh Yoon

Research output: Contribution to journalArticlepeer-review

110 Scopus citations


Background: Quantitative structure-activity relationship (QSAR) is a computational modeling method for revealing relationships between structural properties of chemical compounds and biological activities. QSAR modeling is essential for drug discovery, but it has many constraints. Ensemble-based machine learning approaches have been used to overcome constraints and obtain reliable predictions. Ensemble learning builds a set of diversified models and combines them. However, the most prevalent approach random forest and other ensemble approaches in QSAR prediction limit their model diversity to a single subject. Results: The proposed ensemble method consistently outperformed thirteen individual models on 19 bioassay datasets and demonstrated superiority over other ensemble approaches that are limited to a single subject. The comprehensive ensemble method is publicly available at Conclusions: We propose a comprehensive ensemble method that builds multi-subject diversified models and combines them through second-level meta-learning. In addition, we propose an end-to-end neural network-based individual classifier that can automatically extract sequential features from a simplified molecular-input line-entry system (SMILES). The proposed individual models did not show impressive results as a single model, but it was considered the most important predictor when combined, according to the interpretation of the meta-learning.

Original languageEnglish
Article number521
JournalBMC Bioinformatics
Issue number1
StatePublished - 26 Oct 2019

Bibliographical note

Publisher Copyright:
© 2019 The Author(s).


  • Drug-prediction
  • Ensemble-learning
  • Meta-learning


Dive into the research topics of 'Comprehensive ensemble in QSAR prediction for drug discovery'. Together they form a unique fingerprint.

Cite this