Geometry-based BERT: An experimentally validated deep learning model for molecular property prediction in drug discovery

Xiang Zhang; Chenliang Qian; Bochao Yang; Hongwei Jin; Song Wu; Jie Xia; Fan Yang; Liangren Zhang

doi:10.1016/j.jpha.2025.101465

Article Contents

Article Navigation > Journal of Pharmaceutical Analysis > 2025 > Accepted Manu

Xiang Zhang, Chenliang Qian, Bochao Yang, Hongwei Jin, Song Wu, Jie Xia, Fan Yang, Liangren Zhang. Geometry-based BERT: An experimentally validated deep learning model for molecular property prediction in drug discovery[J]. Journal of Pharmaceutical Analysis. doi: 10.1016/j.jpha.2025.101465

Citation:

Xiang Zhang, Chenliang Qian, Bochao Yang, Hongwei Jin, Song Wu, Jie Xia, Fan Yang, Liangren Zhang. Geometry-based BERT: An experimentally validated deep learning model for molecular property prediction in drug discovery[J]. Journal of Pharmaceutical Analysis. doi: 10.1016/j.jpha.2025.101465

Citation:

Xiang Zhang, Chenliang Qian, Bochao Yang, Hongwei Jin, Song Wu, Jie Xia, Fan Yang, Liangren Zhang. Geometry-based BERT: An experimentally validated deep learning model for molecular property prediction in drug discovery[J]. Journal of Pharmaceutical Analysis. doi: 10.1016/j.jpha.2025.101465

PDF( 2328 KB)

Geometry-based BERT: An experimentally validated deep learning model for molecular property prediction in drug discovery

doi: 10.1016/j.jpha.2025.101465

a. Department of Automation, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China;
b. State Key Laboratory of Bioactive Substance and Function of Natural Medicine, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China;
c. State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing, 100191, China

Funds:

This study was supported by the National Natural Science Foundation of China (Grant No.: 62173282, 62472363, 62573367), CAMS Innovation Fund for Medical Sciences (Grant No.: 2021-I2M-1-069), the 2024 China Industrial Technology Infrastructure Public Service Platform Project (Grant No.: GN2024-31-4700) and the Foreign Expert Program of State Administration of Foreign Experts Affairs (Grant No.: H20240802). We acknowledge Information Center of Institute of Materia Medica, Chinese Academy of Medical Sciences for free access to computing facilities. We also thank Dr. Xuehui Zhang (Shandong First Medical University) for technical support for the DYRK1A activity assay.

Received Date: Dec. 20, 2024
Accepted Date: Oct. 07, 2025
Rev Recd Date: Sep. 30, 2025
Available Online: Oct. 13, 2025

Abstract

Abstract

Various deep learning based methods have significantly impacted the realm of drug discovery. The development of deep learning methods for identifying novel structural types of active compounds has become an urgent challenge. In this paper, we introduce a self-supervised representation learning framework, i.e., Geometry-based BERT (GEO-BERT). GEO-BERT considers the information of atoms and chemical bonds in chemical structures as the input, and integrates the positional information of the three-dimensional conformation of the molecule for training. Specifically, GEO-BERT enhances its ability to characterize molecular structures by introducing three different positional relationships: atom-atom, bond-bond, and atom-bond. By benchmarking study, GEO-BERT has demonstrated optimal performance on multiple benchmarks. We also performed prospective study to validate the GEO-BERT model, with screening for DYRK1A inhibitors as a case. Two potent and novel DYRK1A inhibitors (IC₅₀: <1 μM) were ultimately discovered. Taken together, we have developed an open-source Geometry-based BERT model for molecular property prediction (https://github.com/drug-designer/GEO-BERT) and proved its practical utility in early-stage drug discovery.
- Drug discovery,
- Chemical pre-trained model,
- Self-supervised learning,
- BERT,
- DYRK1A inhibitor

FullText(HTML)

References(45)

References

[1]	D. Vemula, P. Jayasurya, V. Sushmitha, et al., CADD, AI and ML in drug discovery: A comprehensive review, Eur. J. Pharm. Sci. 181 (2023) 106324.
[2]	J.A. DiMasi, H.G. Grabowski, R.W. Hansen, Innovation in the pharmaceutical industry: new estimates of R&D costs, J. Health Econ. 47 (2016) 20-33.
[3]	Q. Liu, Y. Jiang, L. Zhang, et al., A computational toolbox for molecular property prediction based on quantum mechanics and quantitative structure-property relationship, Front. Chem. Sci. Eng. (2022) 1-16.
[4]	Z. Li, M. Jiang, S. Wang, et al., Deep learning methods for molecular representation and property prediction, Drug Discov. Today 27 (2022) 103373.
[5]	D. Rogers, M. Hahn, Extended-connectivity fingerprints, J. Chem. Inf. Model. 50 (2010) 742-754.
[6]	J.L. Durant, B.A. Leland, D.R. Henry, et al., Reoptimization of MDL keys for use in drug discovery, J. Chem. Inf. Comput. Sci. 42 (2002) 1273-1280.
[7]	J. Yang, Y. Cai, K. Zhao, et al., Concepts and applications of chemical fingerprint for hit and lead screening, Drug Discov. Today 27 (2022) 103356.
[8]	A. Cereto-Massague, M.J. Ojeda, C. Valls, et al., Molecular fingerprint similarity search in virtual screening, Methods 71 (2015) 58-63.
[9]	H.Y. Choo, J. Wee, C. Shen, et al., Fingerprint-enhanced graph attention network (FinGAT) model for antibiotic discovery, J. Chem. Inf. Model. 63 (2023) 2928-2935.
[10]	X. Lu, L. Xie, L. Xu, et al., Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph, Comput. Struct. Biotechnol. J. 23 (2024) 1666-1679.
[11]	Y. Hou, S. Wang, B. Bai, et al., Accurate physical property predictions via deep learning, Molecules 27 (2022) 1668.
[12]	Z. Guo, P. Sharma, A. Martinez, et al., Multilingual molecular representation learning via contrastive pre-training, arXiv preprint arXiv:2109.08830 (2021).
[13]	R. Ma, Y. Zhang, X. Wang, et al., Proceedings of the Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 1461-1470.
[14]	J. Xia, Y. Zhu, Y. Du, et al., A systematic survey of chemical pre-trained models, arXiv preprint arXiv:2210.16484 (2022).
[15]	J. Devlin, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[16]	Z. Wu, D. Jiang, J. Wang, et al., Knowledge-based BERT: a method to extract molecular features like computational chemists, Brief. Bioinform. 23 (2022) bbac131.
[17]	J. Xia, C. Zhao, B. Hu, et al., Mole-bert: Rethinking pre-training graph neural networks for molecules, Int. Conf. Learn. Represent. 2023.
[18]	Y. Liu, R. Zhang, T. Li, et al., MolRoPE-BERT: An enhanced molecular representation with Rotary Position Embedding for molecular property prediction, J. Mol. Graph. Model. 118 (2023) 108344.
[19]	B. Li, M. Lin, T. Chen, et al., FG-BERT: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction, Brief. Bioinform. 24 (2023) bbad398.
[20]	H. Ma, Y. Bian, Y. Rong, et al., Cross-dependent graph neural networks for molecular property prediction, Bioinformatics 38 (2022) 2003-2009.
[21]	X. Fang, L. Liu, J. Lei, et al., Geometry-enhanced molecular representation learning for property prediction, Nat. Mach. Intell. 4 (2022) 127-134.
[22]	A. Gaulton, L.J. Bellis, A.P. Bento, et al., ChEMBL: a large-scale bioactivity database for drug discovery, Nucleic Acids Res. 40 (2012) D1100-D1107.
[23]	Z. Wu, B. Ramsundar, E.N. Feinberg, et al., MoleculeNet: a benchmark for molecular machine learning, Chem. Sci. 9 (2018) 513-530.
[24]	A. Gere, A. Racz, D. Bajusz, et al., Multicriteria decision making for evergreen problems in food science by sum of ranking differences, Food Chem. 344 (2021) 128617.
[25]	N. Qiu, C. Qian, T. Guo, et al., Discovery of a novel chemotype as DYRK1A inhibitors against Alzheimer's disease: Computational modeling and biological evaluation, Int. J. Biol. Macromol. 269 (2024) 132024.
[26]	L. Van der Maaten, G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res. 9 (2008).
[27]	X. Tong, D. Wang, X. Ding, et al., Blood-brain barrier penetration prediction enhanced by uncertainty estimation, J. Cheminform. 14 (2022) 44.
[28]	J.B. Baell, G.A. Holloway, New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays, J. Med. Chem. 53 (2010) 2719-2740.
[29]	K.J.T.T.i.A.C. Heberger, Sum of ranking differences compares methods or models fairly, Trends Anal. Chem. 29 (2010) 101-109.
[30]	E. Heid, K.P. Greenman, Y. Chung, et al., Chemprop: a machine learning package for chemical property prediction, J. Chem. Inf. Model. 64 (2023) 9-17.
[31]	X.-C. Zhang, C.-K. Wu, Z.-J. Yang, et al., MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction, Brief. Bioinform. 22 (2021) bbab152.
[32]	F.-Y. Sun, J. Hoffmann, V. Verma, et al., Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization, arXiv preprint arXiv:1908.01000 (2019).
[33]	Z. Hu, Y. Dong, K. Wang, et al., GPT-GNN: Generative Pre-Training of Graph Neural Networks, arXiv. 2020. https://arxiv.org/abs/2006.15437.
[34]	W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, arXiv. 2017. https://arxiv.org/abs/1706.02216.
[35]	W. Hu, B. Liu, J. Gomes, et al., Strategies for pre-training graph neural networks, arXiv preprint arXiv:1905.12265 (2019).
[36]	M. Xu, H. Wang, B. Ni, et al., Self-supervised Graph-level Representation Learning with Local and Global Structure, arXiv. 2021. https://arxiv.org/abs/2106.04113.
[37]	Y. Rong, Y. Bian, T. Xu, et al., Self-supervised graph transformer on large-scale molecular data, arXiv. 2020. https://arxiv.org/abs/2007.02835.
[38]	S. Suresh, P. Li, C. Hao, et al., Adversarial graph augmentation to improve graph contrastive learning, arXiv. 2021. https://arxiv.org/abs/2106.05819.
[39]	Y. You, T. Chen, Y. Shen, et al., Graph Contrastive Learning Automated, arXiv. 2020. https://arxiv.org/abs/2010.13902.
[40]	J. Xia, L. Wu, J. Chen, et al., SimGRACE: A simple framework for graph contrastive learning without data augmentation, Proceedings of the ACM Web Conference 2022, 2022, pp. 1070-1079.
[41]	Y. You, T. Chen, Y. Sui, et al., Graph contrastive learning with augmentations, arXiv. 2020. https://arxiv.org/abs/2010.13902.
[42]	Z. Hou, X. Liu, Y. Cen, et al., GraphMAE: Self-Supervised Masked Graph Autoencoders, Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 594-604.
[43]	H. Stark, D. Beaini, G. Corso, et al., 3d infomax improves gnns for molecular property prediction. Proc. Mach. Learn. Res. 2022, pp. 20479-20502.
[44]	S. Liu, H. Wang, W. Liu, et al., Pre-training molecular graph representation with 3d geometry, arXiv. 2021. https://arxiv.org/abs/2110.07728.
[45]	Z. Zhang, Q. Liu, H. Wang, et al., Motif-based graph self-supervised learning for molecular property prediction, arXiv. 2021. https://arxiv.org/abs/2110.00987.