Citation: | Xiang Zhang, Chenliang Qian, Bochao Yang, Hongwei Jin, Song Wu, Jie Xia, Fan Yang, Liangren Zhang. Geometry-based BERT: An experimentally validated deep learning model for molecular property prediction in drug discovery[J]. Journal of Pharmaceutical Analysis. doi: 10.1016/j.jpha.2025.101465 |
[1] |
D. Vemula, P. Jayasurya, V. Sushmitha, et al., CADD, AI and ML in drug discovery: A comprehensive review, Eur. J. Pharm. Sci. 181 (2023) 106324.
|
[2] |
J.A. DiMasi, H.G. Grabowski, R.W. Hansen, Innovation in the pharmaceutical industry: new estimates of R&D costs, J. Health Econ. 47 (2016) 20-33.
|
[3] |
Q. Liu, Y. Jiang, L. Zhang, et al., A computational toolbox for molecular property prediction based on quantum mechanics and quantitative structure-property relationship, Front. Chem. Sci. Eng. (2022) 1-16.
|
[4] |
Z. Li, M. Jiang, S. Wang, et al., Deep learning methods for molecular representation and property prediction, Drug Discov. Today 27 (2022) 103373.
|
[5] |
D. Rogers, M. Hahn, Extended-connectivity fingerprints, J. Chem. Inf. Model. 50 (2010) 742-754.
|
[6] |
J.L. Durant, B.A. Leland, D.R. Henry, et al., Reoptimization of MDL keys for use in drug discovery, J. Chem. Inf. Comput. Sci. 42 (2002) 1273-1280.
|
[7] |
J. Yang, Y. Cai, K. Zhao, et al., Concepts and applications of chemical fingerprint for hit and lead screening, Drug Discov. Today 27 (2022) 103356.
|
[8] |
A. Cereto-Massague, M.J. Ojeda, C. Valls, et al., Molecular fingerprint similarity search in virtual screening, Methods 71 (2015) 58-63.
|
[9] |
H.Y. Choo, J. Wee, C. Shen, et al., Fingerprint-enhanced graph attention network (FinGAT) model for antibiotic discovery, J. Chem. Inf. Model. 63 (2023) 2928-2935.
|
[10] |
X. Lu, L. Xie, L. Xu, et al., Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph, Comput. Struct. Biotechnol. J. 23 (2024) 1666-1679.
|
[11] |
Y. Hou, S. Wang, B. Bai, et al., Accurate physical property predictions via deep learning, Molecules 27 (2022) 1668.
|
[12] |
Z. Guo, P. Sharma, A. Martinez, et al., Multilingual molecular representation learning via contrastive pre-training, arXiv preprint arXiv:2109.08830 (2021).
|
[13] |
R. Ma, Y. Zhang, X. Wang, et al., Proceedings of the Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 1461-1470.
|
[14] |
J. Xia, Y. Zhu, Y. Du, et al., A systematic survey of chemical pre-trained models, arXiv preprint arXiv:2210.16484 (2022).
|
[15] |
J. Devlin, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
|
[16] |
Z. Wu, D. Jiang, J. Wang, et al., Knowledge-based BERT: a method to extract molecular features like computational chemists, Brief. Bioinform. 23 (2022) bbac131.
|
[17] |
J. Xia, C. Zhao, B. Hu, et al., Mole-bert: Rethinking pre-training graph neural networks for molecules, Int. Conf. Learn. Represent. 2023.
|
[18] |
Y. Liu, R. Zhang, T. Li, et al., MolRoPE-BERT: An enhanced molecular representation with Rotary Position Embedding for molecular property prediction, J. Mol. Graph. Model. 118 (2023) 108344.
|
[19] |
B. Li, M. Lin, T. Chen, et al., FG-BERT: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction, Brief. Bioinform. 24 (2023) bbad398.
|
[20] |
H. Ma, Y. Bian, Y. Rong, et al., Cross-dependent graph neural networks for molecular property prediction, Bioinformatics 38 (2022) 2003-2009.
|
[21] |
X. Fang, L. Liu, J. Lei, et al., Geometry-enhanced molecular representation learning for property prediction, Nat. Mach. Intell. 4 (2022) 127-134.
|
[22] |
A. Gaulton, L.J. Bellis, A.P. Bento, et al., ChEMBL: a large-scale bioactivity database for drug discovery, Nucleic Acids Res. 40 (2012) D1100-D1107.
|
[23] |
Z. Wu, B. Ramsundar, E.N. Feinberg, et al., MoleculeNet: a benchmark for molecular machine learning, Chem. Sci. 9 (2018) 513-530.
|
[24] |
A. Gere, A. Racz, D. Bajusz, et al., Multicriteria decision making for evergreen problems in food science by sum of ranking differences, Food Chem. 344 (2021) 128617.
|
[25] |
N. Qiu, C. Qian, T. Guo, et al., Discovery of a novel chemotype as DYRK1A inhibitors against Alzheimer's disease: Computational modeling and biological evaluation, Int. J. Biol. Macromol. 269 (2024) 132024.
|
[26] |
L. Van der Maaten, G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res. 9 (2008).
|
[27] |
X. Tong, D. Wang, X. Ding, et al., Blood-brain barrier penetration prediction enhanced by uncertainty estimation, J. Cheminform. 14 (2022) 44.
|
[28] |
J.B. Baell, G.A. Holloway, New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays, J. Med. Chem. 53 (2010) 2719-2740.
|
[29] |
K.J.T.T.i.A.C. Heberger, Sum of ranking differences compares methods or models fairly, Trends Anal. Chem. 29 (2010) 101-109.
|
[30] |
E. Heid, K.P. Greenman, Y. Chung, et al., Chemprop: a machine learning package for chemical property prediction, J. Chem. Inf. Model. 64 (2023) 9-17.
|
[31] |
X.-C. Zhang, C.-K. Wu, Z.-J. Yang, et al., MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction, Brief. Bioinform. 22 (2021) bbab152.
|
[32] |
F.-Y. Sun, J. Hoffmann, V. Verma, et al., Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization, arXiv preprint arXiv:1908.01000 (2019).
|
[33] |
Z. Hu, Y. Dong, K. Wang, et al., GPT-GNN: Generative Pre-Training of Graph Neural Networks, arXiv. 2020. https://arxiv.org/abs/2006.15437.
|
[34] |
W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, arXiv. 2017. https://arxiv.org/abs/1706.02216.
|
[35] |
W. Hu, B. Liu, J. Gomes, et al., Strategies for pre-training graph neural networks, arXiv preprint arXiv:1905.12265 (2019).
|
[36] |
M. Xu, H. Wang, B. Ni, et al., Self-supervised Graph-level Representation Learning with Local and Global Structure, arXiv. 2021. https://arxiv.org/abs/2106.04113.
|
[37] |
Y. Rong, Y. Bian, T. Xu, et al., Self-supervised graph transformer on large-scale molecular data, arXiv. 2020. https://arxiv.org/abs/2007.02835.
|
[38] |
S. Suresh, P. Li, C. Hao, et al., Adversarial graph augmentation to improve graph contrastive learning, arXiv. 2021. https://arxiv.org/abs/2106.05819.
|
[39] |
Y. You, T. Chen, Y. Shen, et al., Graph Contrastive Learning Automated, arXiv. 2020. https://arxiv.org/abs/2010.13902.
|
[40] |
J. Xia, L. Wu, J. Chen, et al., SimGRACE: A simple framework for graph contrastive learning without data augmentation, Proceedings of the ACM Web Conference 2022, 2022, pp. 1070-1079.
|
[41] |
Y. You, T. Chen, Y. Sui, et al., Graph contrastive learning with augmentations, arXiv. 2020. https://arxiv.org/abs/2010.13902.
|
[42] |
Z. Hou, X. Liu, Y. Cen, et al., GraphMAE: Self-Supervised Masked Graph Autoencoders, Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 594-604.
|
[43] |
H. Stark, D. Beaini, G. Corso, et al., 3d infomax improves gnns for molecular property prediction. Proc. Mach. Learn. Res. 2022, pp. 20479-20502.
|
[44] |
S. Liu, H. Wang, W. Liu, et al., Pre-training molecular graph representation with 3d geometry, arXiv. 2021. https://arxiv.org/abs/2110.07728.
|
[45] |
Z. Zhang, Q. Liu, H. Wang, et al., Motif-based graph self-supervised learning for molecular property prediction, arXiv. 2021. https://arxiv.org/abs/2110.00987.
|