a Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China;
b Infinite Intelligence Pharma, Beijing, 100083, China;
c Mudi Meng Honors College, China Pharmaceutical University, Nanjing, 210009, China;
d State Key Laboratory of Chemical Biology, Shanghai Institute of Organic Chemistry, Shanghai, 200032, China;
e Interdisciplinary Institute for Medical Engineering, Fuzhou University, Fuzhou, 350108, China
Funds:
This work was supported in part by the National Key R&D Program of China (Grant No.: 2023YFF1205103), the National Natural Science Foundation of China (Grant No.: 220330010), and the Anhui’s Plans for Major Provincial Science & Technology Projects, China (Grant No.: 202303a07020009).
We developed MaxQsaring, a novel universal framework integrating molecular descriptors, fingerprints, and deep-learning pretrained representations, to predict the properties of compounds. Applied to a case study of human ether-à-go-go-related gene (hERG) blockage prediction, MaxQsaring achieved state-of-the-art performance on two challenging external datasets through automatic optimal feature combinations, and successfully identified top the 10 important interpretable features that could be used to model a high-accuracy decision tree. The models’ predictions align well with empirical hERG optimization strategies, demonstrating their interpretability for practical utilities. Deep learning pre-trained representations have been demonstrated to exert a moderate influence on enhancing the performance of predictive models. Nevertheless, their impact on augmenting the generalizability of these models, particularly when applied to compounds possessing novel scaffolds, appears to be comparatively minimal. MaxQsaring excelled in the Therapeutics Data Commons (TDC) benchmarks, ranking first in 19 out of 22 tasks, showcasing its potential for universal accurate compound property prediction to facilitate a high success rate of early drug discovery, which is still a formidable challenge.