, Available online , doi: 10.1016/j.jpha.2025.101263
Abstract:
Cardiotoxicity is a critical issue in drug development that poses serious health risks, including potentially fatal arrhythmias. The human ether-à-go-go related gene (hERG) potassium channel, as one of the primary targets of cardiotoxicity, has garnered widespread attention. Traditional cardiotoxicity testing methods are expensive and time-consuming, making computational virtual screening a suitable alternative. In this study, we employed machine learning techniques utilizing molecular fingerprints and descriptors to predict the cardiotoxicity of compounds, with the aim of improving prediction accuracy and efficiency. We used four types of molecular fingerprints and descriptors combined with machine learning and deep learning algorithms, including Gaussian naive Bayes (NB), random forest (RF), support vector machine (SVM), Knearest neighbors (KNN), eXtreme gradient boosting (XGBoost), and Transformer models, to build predictive models. Our models demonstrated advanced predictive performance. The best machine learning model, XGBoost Morgan, achieved an accuracy (ACC) value of 0.84, and the deep learning model, Transformer_Morgan, achieved the best ACC value of 0.85, showing a high ability to distinguish between toxic and non-toxic compounds. On an external independent validation set, it achieved the best area under the curve (AUC) value of 0.93, surpassing ADMETlab3.0, Cardpred, and CardioDPi. In addition, we explored the integration of molecular descriptors and fingerprints to enhance model performance and found that ensemble methods, such as voting and stacking, provided slight improvements in model stability. Furthermore, the SHapley Additive exPlanations (SHAP) explanations revealed the relationship between benzene rings, fluorine-containing groups, NH groups, oxygen in ether groups, and cardiotoxicity, highlighting the importance of these features. This study not only improved the predictive accuracy of cardiotoxicity models but also promoted a more reliable and sf method for drug safety assessment. Using computational methods, this study facilitates a more efficient drug development process, reduces costs, and improves the safety of new drug candidates, ultimately benefiting medical and public health.