Volume 15 Issue 6
Jun.  2025
Turn off MathJax
Article Contents
Jianmin Wang, Peng Zhou, Zixu Wang, Wei Long, Yangyang Chen, Kyoung Tai No, Dongsheng Ouyang, Jiashun Mao, Xiangxiang Zeng. Diffusion-based generative drug-like molecular editing with chemical natural language[J]. Journal of Pharmaceutical Analysis, 2025, 15(6): 101137. doi: 10.1016/j.jpha.2024.101137
Citation: Jianmin Wang, Peng Zhou, Zixu Wang, Wei Long, Yangyang Chen, Kyoung Tai No, Dongsheng Ouyang, Jiashun Mao, Xiangxiang Zeng. Diffusion-based generative drug-like molecular editing with chemical natural language[J]. Journal of Pharmaceutical Analysis, 2025, 15(6): 101137. doi: 10.1016/j.jpha.2024.101137

Diffusion-based generative drug-like molecular editing with chemical natural language

doi: 10.1016/j.jpha.2024.101137
Funds:

This research was supported by the Yonsei University graduate school Department of Integrative Biotechnology.

  • Received Date: May 14, 2024
  • Accepted Date: Oct. 29, 2024
  • Rev Recd Date: Oct. 22, 2024
  • Publish Date: Nov. 02, 2024
  • Recently, diffusion models have emerged as a promising paradigm for molecular design and optimization. However, most diffusion-based molecular generative models focus on modeling 2D graphs or 3D geometries, with limited research on molecular sequence diffusion models. The International Union of Pure and Applied Chemistry (IUPAC) names are more akin to chemical natural language than the simplified molecular input line entry system (SMILES) for organic compounds. In this work, we apply an IUPAC-guided conditional diffusion model to facilitate molecular editing from chemical natural language to chemical language (SMILES) and explore whether the pre-trained generative performance of diffusion models can be transferred to chemical natural language. We propose DiffIUPAC, a controllable molecular editing diffusion model that converts IUPAC names to SMILES strings. Evaluation results demonstrate that our model outperforms existing methods and successfully captures the semantic rules of both chemical languages. Chemical space and scaffold analysis show that the model can generate similar compounds with diverse scaffolds within the specified constraints. Additionally, to illustrate the model’s applicability in drug design, we conducted case studies in functional group editing, analogue design and linker design.

  • loading
  • [1]
    P.G. Polishchuk, T.I. Madzhidov, A. Varnek, Estimation of the size of drug-like chemical space based on GDB-17 data, J. Comput. Aided. Mol. Des. 27 (2013) 675-679.
    [2]
    D. Sun, W. Gao, H. Hu, et al., Why 90% of clinical drug development fails and how to improve it? Acta Pharm. Sin. B 12 (2022) 3049-3062.
    [3]
    J.A. DiMasi, H.G. Grabowski, R.W. Hansen, Innovation in the pharmaceutical industry: New estimates of R&D costs, J. Health Econ. 47 (2016) 20-33.
    [4]
    J. Vamathevan, D. Clark, P. Czodrowski, et al., Applications of machine learning in drug discovery and development, Nat. Rev. Drug Discov. 18 (2019) 463-477.
    [5]
    B. Dou, Z. Zhu, E. Merkurjev, et al., Machine learning methods for small data challenges in molecular science, Chem. Rev. 123 (2023) 8736-8780.
    [6]
    Q. Bai, J. Ma, T. Xu, AI deep learning generative models for drug discovery, In: Applications of Generative AI. Springer International Publishing, Cham, 2024, pp. 461-475.
    [7]
    J.P. Vert, How will generative AI disrupt data science in drug discovery?, Nat. Biotechnol. 41 (2023) 750-751.
    [8]
    C. Pang, J. Qiao, X. Zeng, et al., Deep generative models in de novo drug molecule generation, J. Chem. Inf. Model. 64 (2024) 2174-2194.
    [9]
    D.M. Anstine, O. Isayev, Generative models as an emerging paradigm in the chemical sciences, J. Am. Chem. Soc. 145 (2023) 8736-8750.
    [10]
    Z. Guo, P. Sharma, A. Martinez, et al., Multilingual molecular representation learning via contrastive pre-training, Proc. 60th Annu. Meet. Assoc. Comput. Linguist. 1(2022) 3441-3453.
    [11]
    X. Zeng, H. Xiang, L. Yu, et al., Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework, Nat. Mach. Intell. 4 (2022) 1004-1016.
    [12]
    Z. Li, M. Jiang, S. Wang, et al., Deep learning methods for molecular representation and property prediction, Drug Discov. Today 27 (2022) 103373.
    [13]
    J. Mao, J. Wang, A. Zeb, et al., Transformer-based molecular generative model for antiviral drug design, J. Chem. Inf. Model. 64 (2024) 2733-2745.
    [14]
    D. Rothchild, A. Tamkin, J. Yu, et al., C5T5: Controllable generation of organic molecules with transformers, arXiv. 2021. https://arxiv.org/abs/2108.10307.
    [15]
    Z. Zhu, J. Lu, S. Yuan, et al., Automated generation and analysis of molecular images using generative artificial intelligence models, J. Phys. Chem. Lett. 15 (2024) 1985-1992.
    [16]
    G. Ye, X. Cai, H. Lai, et al., DrugAssist: A large language model for molecule optimization, Brief. Bioinform. 26 (2024), bbae693.
    [17]
    P. Zhou, J. Wang, C. Li, et al., Instruction multi-constraint molecular generation using a teacher-student large language model, BMC Biol. 23 (2025), 105.
    [18]
    A.D. White, The future of chemistry is language, Nat. Rev. Chem. 7 (2023) 457-458.
    [19]
    K. Rajan, A. Zielesny, C. Steinbeck, STOUT: SMILES to IUPAC names using neural machine translation, J. Cheminf. 13 (2021), 34.
    [20]
    L. Krasnov, I. Khokhlov, M.V. Fedorov, et al., Transformer-based artificial neural networks for the conversion between chemical notations, Sci. Rep. 11 (2021 ), 14798.
    [21]
    X. Tong, X. Liu, X. Tan, et al., Generative models for de novo drug design, J. Med. Chem. 64 (2021) 14011-14027.
    [22]
    Z. Guo, J. Liu, Y. Wang, et al., Diffusion models in bioinformatics and computational biology, Nat. Rev. Bioeng. 2 (2024) 136-154.
    [23]
    N.T. Runcie, A.S.J.S. Mey, SILVR: Guided diffusion for molecule generation, J. Chem. Inf. Model. 63 (2023) 5996-6005.
    [24]
    J. Xie, S. Chen, J. Lei, et al., DiffDec: Structure-aware scaffold decoration with an end-to-end diffusion model, J. Chem. Inf. Model. 64 (2024) 2554-2564.
    [25]
    Z. Wang, Y. Chen, X. Guo, et al., DiffSeqMol: A non-autoregressive diffusion-based approach for molecular sequence generation and optimization, Curr. Bioinform. 19 (2024) 1-13.
    [26]
    S. Kim, P.A. Thiessen, E.E. Bolton, et al., PubChem substance and compound databases, Nucleic Acids Res. 44 (2016) D1202-D1213.
    [27]
    G. Landrum, RDKit: Open-source cheminformatics [software]. https://www.rdkit.org.
    [28]
    J. Wang, Y. Chu, J. Mao, et al., De novo molecular design with deep molecular generative models for PPI inhibitors, Brief. Bioinform. 23 (2022), bbac285.
    [29]
    J. Mao, J. Wang, K.H. Cho, et al., iupacGPT: IUPAC-based large-scale molecular pre-trained model for property prediction and molecule generation, Chem-Rxiv. 2023. https://doi.org/10.26434/chemrxiv-2023-5kjvh.
    [30]
    P. Ertl, A. Schuffenhauer, Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions, J. Cheminf. 1 (2009), 8.
    [31]
    G.R. Bickerton, G.V. Paolini, J. Besnard, et al., Quantifying the chemical beauty of drugs, Nat. Chem. 4 (2012) 90-98.
    [32]
    D.A. Dablain, G.H. Siwo, N. V Chawla, Generative AI design and exploration of nucleoside analogs, ChemRxiv. 2021. https://doi.org/10.26434/chemrxiv-2021-l5pr9.
    [33]
    H. Yuan, Z. Yuan, C. Tan, et al., Text diffusion model with encoder-decoder transformers for sequence-to-sequence generation, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico, 2024, pp. 22-39.
    [34]
    X. Lisa Li, J. Thickstun, I. Gulrajani, et al., Diffusion-LM improves controllable text generation, Adv. Neural Inf. Process Syst. 35 (2022) 4328-4343.
    [35]
    S. Gong, M. Li, J. Feng, et al., DiffuSeq: Sequence to sequence text generation with diffusion models, Poster of 11th International Conference on Learning Representations, May 1-5, 2023, Kigali, Rwanda.
    [36]
    D. Polykovskiy, A. Zhebrak, B. Sanchez-Lengeling, et al., Molecular sets (MOSES): A benchmarking platform for molecular generation models, Front. Pharmacol. 11 (2020), 565644.
    [37]
    K. Preuer, P. Renz, T. Unterthiner, et al., Frechet ChemNet distance: A metric for generative models for molecules in drug discovery, J. Chem. Inf. Model. 58 (2018) 1736-1741.
    [38]
    S. Bonazzi, E. d’Hennezel, R.E.J. Beckwith, et al., Discovery and characterization of a selective IKZF2 glue degrader for cancer immunotherapy, Cell Chem. Biol. 30 (2023) 235-247.e12.
    [39]
    J.L. Durant, B.A. Leland, D.R. Henry, et al., Reoptimization of MDL keys for use in drug discovery, J. Chem. Inf. Comput. Sci. 42 (2002) 1273-1280.
    [40]
    A.M. Schreyer, T. Blundell, USRCAT: Real-time ultrafast shape recognition with pharmacophoric constraints, J Cheminf. 4 (2012), 27.
    [41]
    L. Van der Maaten, G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res. 9 (2008) 2579-2605.
    [42]
    L. Wang, G.A. Doherty, A.S. Judd, et al., Discovery of A-1331852, a first-in-class, potent, and orally-bioavailable BCL-xL inhibitor, ACS Med. Chem. Lett. 11 (2020) 1829-1836.
    [43]
    W.J. Allen, T.E. Balius, S. Mukherjee, et al., DOCK 6: Impact of new features and current docking performance, J. Comput. Chem. 36 (2015) 1132-1156.
    [44]
    T. Kosugi, M. Ohue, Quantitative estimate index for early-stage screening of compounds targeting protein-protein interactions, Int. J. Mol. Sci. 22 (2021), 10925.
    [45]
    L.L.C. Schrodinger, The PyMOL molecular graphics system [software], version 1.8, 2015.
    [46]
    Q. Shi, M. Xu, Z. Kang, et al., Menin-MLL1 interaction small molecule inhibitors: A potential therapeutic strategy for leukemia and cancers, Molecules 28 (2023), 3026.
    [47]
    S. Klossowski, H. Miao, K. Kempinska, et al., Menin inhibitor MI-3454 induces remission in MLL1-rearranged and NPM1-mutated models of leukemia, J. Clin. Invest. 130 (2020) 981-99.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(1)

    Article Metrics

    Article views (251) PDF downloads(32) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return