Journal of Pharmaceutical Analysis

Multi-scale information fusion and decoupled representation learning for robust microbe-disease interaction prediction

Wentao Wang, Qiaoying Yan, Qingquan Liao, Xinyuan Jin, Yinyin Gong, Linlin Zhuo, Xiangzheng Fu, Dongsheng Cao

2025, 15(8): 101134. doi: 10.1016/j.jpha.2024.101134

Abstract(566) HTML Full Text PDF(21)

Abstract:
Research indicates that microbe activity within the human body significantly influences health by being closely linked to various diseases. Accurately predicting microbe-disease interactions (MDIs) offers critical insights for disease intervention and pharmaceutical research. Current advanced AI-based technologies automatically generate robust representations of microbes and diseases, enabling effective MDI predictions. However, these models continue to face significant challenges. A major issue is their reliance on complex feature extractors and classifiers, which substantially diminishes the models’ generalizability. To address this, we introduce a novel graph autoencoder framework that utilizes decoupled representation learning and multi-scale information fusion strategies to efficiently infer potential MDIs. Initially, we randomly mask portions of the input microbe-disease graph based on Bernoulli distribution to boost self-supervised training and minimize noise-related performance degradation. Secondly, we employ decoupled representation learning technology, compelling the graph neural network (GNN) to independently learn the weights for each feature subspace, thus enhancing its expressive power. Finally, we implement multi-scale information fusion technology to amalgamate the multi-layer outputs of GNN, reducing information loss due to occlusion. Extensive experiments on public datasets demonstrate that our model significantly surpasses existing top MDI prediction models. This indicates that our model can accurately predict unknown MDIs and is likely to aid in disease discovery and precision pharmaceutical research. Code and data are accessible at: https://github.com/shmildsj/MDI-IFDRL.

Artificial intelligence-aided endoscopic in-line particle size analysis during the pellet layering process

Orsolya Péterfi, Nikolett Kállai-Szabó, Kincső Renáta Demeter, Ádám Tibor Barna, István Antal, Edina Szabó, Emese Sipos, Zsombor Kristóf Nagy, Dorián László Galata

2025, 15(8): 101227. doi: 10.1016/j.jpha.2025.101227

Abstract(368) HTML Full Text PDF(8)

Abstract:
In this study, an artificial intelligence-based machine vision system was developed for in-line particle size analysis during the pellet layering process. Drug-layered pellets were produced by coating microcrystalline cellulose cores with an ibuprofen-containing layering liquid until the target drug content was achieved. Drug content increases with pellet size; therefore, particle size monitoring can ensure product safety and quality. The direct imaging system, consisting of a rigid endoscope, a light source, and a high-speed camera, provides real-time information about pellet size and layer uniformity, enabling timely intervention in the case of out-of-spec products. A convolutional neural network-based instance segmentation algorithm was employed to detect particles in focus, ensuring that pellet size could be accurately determined despite the dense flow of the particles. After training the model, the performance of the developed system was assessed by analysing the particle size distribution of pellet cores with variable sizes within the 250–850 μm size range. The endoscopic system was tested in-line at a larger scale during the drug layering of inert pellet cores. The particle size data acquired in real time with the endoscopic imaging system corresponded with the reference methods, demonstrating the feasibility of the proposed machine vision-based method as a process analytical technology tool for in-line process monitoring.

LocPro: A deep learning-based prediction of protein subcellular localization for promoting multi-directional pharmaceutical research

Yintao Zhang, Lingyan Zheng, Nanxin You, Wei Hu, Wanghao Jiang, Mingkun Lu, Hangwei Xu, Haibin Dai, Tingting Fu, Ying Zhou

2025, 15(8): 101255. doi: 10.1016/j.jpha.2025.101255

Abstract(573) HTML Full Text PDF(12)

Abstract:
Drug development encompasses multiple processes, wherein protein subcellular localization is essential. It promotes target identification, treatment development, and the design of drug delivery systems. In this research, a deep learning framework called LocPro is presented for predicting protein subcellular localization. Specifically, LocPro is unique in (a) combining protein representations from the pre-trained large language model (LLM) ESM2 and the expert-driven tool PROFEAT, (b) implementing a hybrid deep neural network architecture that integrates convolutional neural network (CNN), fully connected (FC) layer, and bidirectional long short-term memory (BiLSTM) blocks, and (c) developing a multi-label framework for predicting protein subcellular localization at multiple granularity levels. Additionally, a dataset was curated and divided using a homology-based strategy for training and validation. Comparative analyses show that LocPro outperforms existing methods in sequence-based multi-label protein subcellular localization prediction. The practical utility of this framework is further demonstrated through case studies on drug target subcellular localization. All in all, LocPro serves as a valuable complement to existing protein localization prediction tools. The web server is freely accessible at https://idrblab.org/LocPro/.

Predicting cardiotoxicity in drug development: A deep learning approach

Kaifeng Liu, Huizi Cui, Xiangyu Yu, Wannan Li, Weiwei Han

2025, 15(8): 101263. doi: 10.1016/j.jpha.2025.101263

Abstract(802) HTML Full Text PDF(36)

Abstract:
Cardiotoxicity is a critical issue in drug development that poses serious health risks, including potentially fatal arrhythmias. The human ether-à-go-go related gene (hERG) potassium channel, as one of the primary targets of cardiotoxicity, has garnered widespread attention. Traditional cardiotoxicity testing methods are expensive and time-consuming, making computational virtual screening a suitable alternative. In this study, we employed machine learning techniques utilizing molecular fingerprints and descriptors to predict the cardiotoxicity of compounds, with the aim of improving prediction accuracy and efficiency. We used four types of molecular fingerprints and descriptors combined with machine learning and deep learning algorithms, including Gaussian naive Bayes (NB), random forest (RF), support vector machine (SVM), K-nearest neighbors (KNN), eXtreme gradient boosting (XGBoost), and Transformer models, to build predictive models. Our models demonstrated advanced predictive performance. The best machine learning model, XGBoost Morgan, achieved an accuracy (ACC) value of 0.84, and the deep learning model, Transformer_Morgan, achieved the best ACC value of 0.85, showing a high ability to distinguish between toxic and non-toxic compounds. On an external independent validation set, it achieved the best area under the curve (AUC) value of 0.93, surpassing ADMETlab3.0, Cardpred, and CardioDPi. In addition, we explored the integration of molecular descriptors and fingerprints to enhance model performance and found that ensemble methods, such as voting and stacking, provided slight improvements in model stability. Furthermore, the SHapley Additive exPlanations (SHAP) explanations revealed the relationship between benzene rings, fluorine-containing groups, NH groups, oxygen in ether groups, and cardiotoxicity, highlighting the importance of these features. This study not only improved the predictive accuracy of cardiotoxicity models but also promoted a more reliable and scientifically interpretable method for drug safety assessment. Using computational methods, this study facilitates a more efficient drug development process, reduces costs, and improves the safety of new drug candidates, ultimately benefiting medical and public health.

Prioritization of potential drug targets for diabetic kidney disease using integrative omics data mining and causal inference

Junyu Zhang, Jie Peng, Chaolun Yu, Yu Ning, Wenhui Lin, Mingxing Ni, Qiang Xie, Chuan Yang, Huiying Liang, Miao Lin

2025, 15(8): 101265. doi: 10.1016/j.jpha.2025.101265

Abstract(501) HTML Full Text PDF(21)

Abstract:
Diabetic kidney disease (DKD) with increasing global prevalence lacks effective therapeutic targets to halt or reverse its progression. Therapeutic targets supported by causal genetic evidence are more likely to succeed in randomized clinical trials. In this study, we integrated large-scale plasma proteomics, genetic-driven causal inference, and experimental validation to identify prioritized targets for DKD using the UK Biobank (UKB) and FinnGen cohorts. Among 2844 diabetic patients (528 with DKD), we identified 37 targets significantly associated with incident DKD, supported by both observational and causal evidence. Of these, 22% (8/37) of the potential targets are currently under investigation for DKD or other diseases. Our prospective study confirmed that higher levels of three prioritized targets—insulin-like growth factor binding protein 4 (IGFBP4), family with sequence similarity 3 member C (FAM3C), and prostaglandin D2 synthase (PTGDS)—were associated with a 4.35, 3.51, and 3.57-fold increased likelihood of developing DKD, respectively. In addition, population-level protein-altering variants (PAVs) analysis and in vitro experiments cross-validated FAM3C and IGFBP4 as potential new target candidates for DKD, through the classic NLR family pyrin domain containing 3 (NLRP3)-caspase-1-gasdermin D (GSDMD) apoptotic axis. Our results demonstrate that integrating omics data mining with causal inference may be a promising strategy for prioritizing therapeutic targets.

GPT2-ICC: A data-driven approach for accurate ion channel identification using pre-trained large language models

Zihan Zhou, Yang Yu, Chengji Yang, Leyan Cao, Shaoying Zhang, Junnan Li, Yingnan Zhang, Huayun Han, Guoliang Shi, Qiansen Zhang, Juwen Shen, Huaiyu Yang

2025, 15(8): 101302. doi: 10.1016/j.jpha.2025.101302

Abstract(431) HTML Full Text PDF(4)

Abstract:
Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces. Here we have developed a deep learning algorithm, GPT2 Ion Channel Classifier (GPT2-ICC), which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins. GPT2-ICC integrates representation learning with a large language model (LLM)-based classifier, enabling highly accurate identification of potential ion channels. Several potential ion channels were predicated from the unannotated human proteome, further demonstrating GPT2-ICC’s generalization ability. This study marks a significant advancement in artificial-intelligence-driven ion channel research, highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data. Moreover, it provides a valuable computational tool for uncovering previously uncharacterized ion channels.

A multimodal contrastive learning framework for predicting P-glycoprotein substrates and inhibitors

Yixue Zhang, Jialu Wu, Yu Kang, Tingjun Hou

2025, 15(8): 101313. doi: 10.1016/j.jpha.2025.101313

Abstract(409) HTML Full Text PDF(11)

Abstract:
P-glycoprotein (P-gp) is a transmembrane protein widely involved in the absorption, distribution, metabolism, excretion, and toxicity (ADMET) of drugs within the human body. Accurate prediction of P-gp inhibitors and substrates is crucial for drug discovery and toxicological assessment. However, existing models rely on limited molecular information, leading to suboptimal model performance for predicting P-gp inhibitors and substrates. To overcome this challenge, we compiled an extensive dataset from public databases and literature, consisting of 5,943 P-gp inhibitors and 4,018 substrates, notable for their high quantity, quality, and structural uniqueness. In addition, we curated two external test sets to validate the model's generalization capability. Subsequently, we developed a multimodal graph contrastive learning (GCL) model for the prediction of P-gp inhibitors and substrates (MC-PGP). This framework integrates three types of features from Simplified Molecular Input Line Entry System (SMILES) sequences, molecular fingerprints, and molecular graphs using an attention-based fusion strategy to generate a unified molecular representation. Furthermore, we employed a GCL approach to enhance structural representations by aligning local and global structures. Extensive experimental results highlight the superior performance of MC-PGP, which achieves improvements in the area under the curve of receiver operating characteristic (AUC-ROC) of 9.82% and 10.62% on the external P-gp inhibitor and external P-gp substrate datasets, respectively, compared with 12 state-of-the-art methods. Furthermore, the interpretability analysis of all three molecular feature types offers comprehensive and complementary insights, demonstrating that MC-PGP effectively identifies key functional groups involved in P-gp interactions. These chemically intuitive insights provide valuable guidance for the design and optimization of drug candidates.

DTLCDR: A target-based multimodal fusion deep learning framework for cancer drug response prediction

Jie Yu, Cheng Shi, Yiran Zhou, Ningfeng Liu, Xiaolin Zong, Zhenming Liu, Liangren Zhang

2025, 15(8): 101315. doi: 10.1016/j.jpha.2025.101315

Abstract(458) HTML Full Text PDF(21)

Abstract:
Accurate prediction of drug responses in cancer cell lines (CCLs) and transferable prediction of clinical drug responses using CCLs are two major tasks in personalized medicine. Despite the rapid advancements in existing computational methods for preclinical and clinical cancer drug response (CDR) prediction, challenges remain regarding the generalization of new drugs that are unseen in the training set. Herein, we propose a multimodal fusion deep learning (DL) model called drug-target and single-cell language based CDR (DTLCDR) to predict preclinical and clinical CDRs. The model integrates chemical descriptors, molecular graph representations, predicted protein target profiles of drugs, and cell line expression profiles with general knowledge from single cells. Among these features, a well-trained drug-target interaction (DTI) prediction model is used to generate target profiles of drugs, and a pretrained single-cell language model is integrated to provide general genomic knowledge. Comparison experiments on the cell line drug sensitivity dataset demonstrated that DTLCDR exhibited improved generalizability and robustness in predicting unseen drugs compared with previous state-of-the-art baseline methods. Further ablation studies verified the effectiveness of each component of our model, highlighting the significant contribution of target information to generalizability. Subsequently, the ability of DTLCDR to predict novel molecules was validated through in vitro cell experiments, demonstrating its potential for real-world applications. Moreover, DTLCDR was transferred to the clinical datasets, demonstrating satisfactory performance in the clinical data, regardless of whether the drugs were included in the cell line dataset. Overall, our results suggest that the DTLCDR is a promising tool for personalized drug discovery.

ACtriplet: An improved deep learning model for activity cliffs prediction by integrating triplet loss and pre-training

Xinxin Yu, Yimeng Wang, Long Chen, Weihua Li, Yun Tang, Guixia Liu

2025, 15(8): 101317. doi: 10.1016/j.jpha.2025.101317

Abstract(398) HTML Full Text PDF(16)

Abstract:
Activity cliffs (ACs) are generally defined as pairs of similar compounds that only differ by a minor structural modification but exhibit a large difference in their binding affinity for a given target. ACs offer crucial insights that aid medicinal chemists in optimizing molecular structures. Nonetheless, they also form a major source of prediction error in structure-activity relationship (SAR) models. To date, several studies have demonstrated that deep neural networks based on molecular images or graphs might need to be improved further in predicting the potency of ACs. In this paper, we integrated the triplet loss in face recognition with pre-training strategy to develop a prediction model ACtriplet, tailored for ACs. Through extensive comparison with multiple baseline models on 30 benchmark datasets, the results showed that ACtriplet was significantly better than those deep learning (DL) models without pre-training. In addition, we explored the effect of pre-training on data representation. Finally, the case study demonstrated that our model's interpretability module could explain the prediction results reasonably. In the dilemma that the amount of data could not be increased rapidly, this innovative framework would better make use of the existing data, which would propel the potential of DL in the early stage of drug discovery and optimization.

Optimizing blood-brain barrier permeability in KRAS inhibitors: A structure-constrained molecular generation approach

Xia Sheng, Yike Gui, Jie Yu, Yitian Wang, Zhenghao Li, Xiaoya Zhang, Yuxin Xing, Yuqing Wang, Zhaojun Li, Mingyue Zheng, Liquan Yang, Xutong Li

2025, 15(8): 101337. doi: 10.1016/j.jpha.2025.101337

Abstract(214) HTML Full Text PDF(6)

Abstract:
Kirsten rat sarcoma viral oncogene homolog (KRAS) protein inhibitors are a promising class of therapeutics, but research on molecules that effectively penetrate the blood-brain barrier (BBB) remains limited, which is crucial for treating central nervous system (CNS) malignancies. Although molecular generation models have recently advanced drug discovery, they often overlook the complexity of biological and chemical factors, leaving room for improvement. In this study, we present a structure-constrained molecular generation workflow designed to optimize lead compounds for both drug efficacy and drug absorption properties. Our approach utilizes a variational autoencoder (VAE) generative model integrated with reinforcement learning for multi-objective optimization. This method specifically aims to enhance BBB permeability (BBBp) while maintaining high-affinity substructures of KRAS inhibitors. To support this, we incorporate a specialized KRAS BBB predictor based on active learning and an affinity predictor employing comparative learning models. Additionally, we introduce two novel metrics, the knowledge-integrated reproduction score (KIRS) and the composite diversity score (CDS), to assess structural performance and biological relevance. Retrospective validation with KRAS inhibitors, AMG510 and MRTX849, demonstrates the framework’s effectiveness in optimizing BBBp and highlights its potential for real-world drug development applications. This study provides a robust framework for accelerating the structural enhancement of lead compounds, advancing the drug development process across diverse targets.

Discovery of selective HDAC6 inhibitors driven by artificial intelligence and molecular dynamics simulation approaches

Xingang Liu, Hao Yang, Xinyu Liu, Minjie Mou, Jie Liu, Wenying Yan, Tianle Niu, Ziyang Zhang, He Shi, Xiangdong Su, Xuedong Li, Yang Zhang, Qingzhong Jia

2025, 15(8): 101338. doi: 10.1016/j.jpha.2025.101338

Abstract(676) HTML Full Text PDF(33)

Abstract:
Increasing evidence showed that histone deacetylase 6 (HDAC6) dysfunction is directly associated with the onset and progression of various diseases, especially cancers, making the development of HDAC6-targeted anti-tumor agents a research hotspot. In this study, artificial intelligence (AI) technology and molecular simulation strategies were fully integrated to construct an efficient and precise drug screening pipeline, which combined Voting strategy based on compound-protein interaction (CPI) prediction models, cascade molecular docking, and molecular dynamic (MD) simulations. The biological potential of the screened compounds was further evaluated through enzymatic and cellular activity assays. Among the identified compounds, Cmpd.18 exhibited more potent HDAC6 enzyme inhibitory activity (IC₅₀ = 5.41 nM) than that of tubastatin A (TubA) (IC₅₀ = 15.11 nM), along with a favorable subtype selectivity profile (selectivity index ≈ 117.23 for HDAC1), which was further verified by the Western blot analysis. Additionally, Cmpd.18 induced G2/M phase arrest and promoted apoptosis in HCT-116 cells, exerting desirable antiproliferative activity (IC₅₀ = 2.59 μM). Furthermore, based on long-term MD simulation trajectory, the key residues facilitating Cmpd.18's binding were identified by decomposition free energy analysis, thereby elucidating its binding mechanism. Moreover, the representative conformation analysis also indicated that Cmpd.18 could stably bind to the active pocket in an effective conformation, thus demonstrating the potential for in-depth research of the 2-(2-phenoxyethyl)pyridazin-3(2H)-one scaffold.

Quantifying compatibility mechanisms in traditional Chinese medicine with interpretable graph neural networks

Jingqi Zeng, Xiaobin Jia

2025, 15(8): 101342. doi: 10.1016/j.jpha.2025.101342

Abstract(513) HTML Full Text PDF(24)

Abstract:
Traditional Chinese medicine (TCM) features complex compatibility mechanisms involving multi-component, multi-target, and multi-pathway interactions. This study presents an interpretable graph artificial intelligence (GraphAI) framework to quantify such mechanisms in Chinese herbal formulas (CHFs). A multidimensional TCM knowledge graph (TCM-MKG; https://zenodo.org/records/13763953) was constructed, integrating seven standardized modules: TCM terminology, Chinese patent medicines (CPMs), Chinese herbal pieces (CHPs), pharmacognostic origins (POs), chemical compounds, biological targets, and diseases. A neighbor-diffusion strategy was used to address the sparsity of compound-target associations, increasing target coverage from 12.0% to 98.7%. Graph neural networks (GNNs) with attention mechanisms were applied to 6,080 CHFs, modeled as graphs with CHPs as nodes. To embed domain-specific semantics, virtual nodes medicinal properties, i.e., therapeutic nature, flavor, and meridian tropism, were introduced, enabling interpretable modeling of inter-CHP relationships. The model quantitatively captured classical compatibility roles such as “monarch-minister-assistant-guide”, and uncovered TCM etiological types derived from diagnostic and efficacy patterns. Model validation using 215 CHFs used for coronavirus disease 2019 (COVID-19) management highlighted Radix Astragali-Rhizoma Phragmitis as a high-attention herb pair. Mass spectrometry (MS) and target prediction identified three active compounds, i.e., methylinissolin-3-O-glucoside, corydalin, and pingbeinine, which converge on pathways such as neuroactive ligand-receptor interaction, xenobiotic response, and neuronal function, supporting their neuroimmune and detoxification potential. Given their high safety and dietary compatibility, this herb pair may offer therapeutic value for managing long COVID-19. All data and code are openly available (https://github.com/ZENGJingqi/GraphAI-for-TCM), providing a scalable and interpretable platform for TCM mechanism research and discovery of bioactive herbal constituents.

An inductive learning-based method for predicting drug-gene interactions using a multi-relational drug-disease-gene graph

Jian He, Yanling Wu, Linxi Yuan, Jiangguo Qiu, Menglong Li, Xuemei Pu, Yanzhi Guo

2025, 15(8): 101347. doi: 10.1016/j.jpha.2025.101347

Abstract(412) HTML Full Text PDF(11)

Abstract:
Computational analysis can accurately detect drug-gene interactions (DGIs) cost-effectively. However, transductive learning models are the hotspot to reveal the promising performance for unknown DGIs (both drugs and genes are present in the training model), without special attention to the unseen DGIs (both drugs and genes are absent in the training model). In view of this, this study, for the first time, proposed an inductive learning-based model for the precise identification of unseen DGIs. In our study, by integrating disease nodes to avoid data sparsity, a multi-relational drug-disease-gene (DDG) graph was constructed to achieve effective fusion of data on DDG intro-relationships and inter-actions. Following the extraction of graph features by utilizing graph embedding algorithms, our next step was the retrieval of the attributes of individual gene and drug nodes. In this way, a hybrid feature characterization was represented by integrating graph features and node attributes. Machine learning (ML) models were built, enabling the fulfillment of transductive predictions of unknown DGIs. To realize inductive learning, this study generated an innovative idea of transforming known node vectors derived from the DDG graph into representations of unseen nodes using node similarities as weights, enabling inductive predictions for the unseen DGIs. Consequently, the final model was superior to existing models, with significant improvement in predicting both external unknown and unseen DGIs. The practical feasibility of our model was further confirmed through case study and molecular docking. In summary, this study establishes an efficient data-driven approach through the proposed modeling, suggesting its value as a promising tool for accelerating drug discovery and repurposing.

Repurposing drugs for the human dopamine transporter through WHALES descriptors-based virtual screening and bioactivity evaluation

Ding Luo, Zhou Sha, Junli Mao, Jialing Liu, Yue Zhou, Haibo Wu, Weiwei Xue

2025, 15(8): 101368. doi: 10.1016/j.jpha.2025.101368

Abstract(401) HTML Full Text PDF(20)

Abstract:
Computational approaches, encompassing both physics-based and machine learning (ML) methodologies, have gained substantial traction in drug repurposing efforts targeting specific therapeutic entities. The human dopamine (DA) transporter (hDAT) is the primary therapeutic target of numerous psychiatric medications. However, traditional hDAT-targeting drugs, which interact with the primary binding site, encounter significant limitations, including addictive potential and stimulant effects. In this study, we propose an integrated workflow combining virtual screening based on weighted holistic atom localization and entity shape (WHALES) descriptors with in vitro experimental validation to repurpose novel hDAT-targeting drugs. Initially, WHALES descriptors facilitated a similarity search, employing four benztropine-like atypical inhibitors known to bind hDAT's allosteric site as templates. Consequently, from a compound library of 4,921 marketed and clinically tested drugs, we identified 27 candidate atypical inhibitors. Subsequently, ADMETlab was employed to predict the pharmacokinetic and toxicological properties of these candidates, while induced-fit docking (IFD) was performed to estimate their binding affinities. Six compounds were selected for in vitro assessments of neurotransmitter reuptake inhibitory activities. Among these, three exhibited significant inhibitory potency, with half maximal inhibitory concentration (IC₅₀) values of 0.753 μM, 0.542 μM, and 1.210 μM, respectively. Finally, molecular dynamics (MD) simulations and end-point binding free energy analyses were conducted to elucidate and confirm the inhibitory mechanisms of the repurposed drugs against hDAT in its inward-open conformation. In conclusion, our study not only identifies promising active compounds as potential atypical inhibitors for novel therapeutic drug development targeting hDAT but also validates the effectiveness of our integrated computational and experimental workflow for drug repurposing.

ToxBERT: An explainable AI framework for enhancing prediction of adverse drug reactions and structural insights

Yujie He, Xiang Lv, Wulin Long, Shengqiu Zhai, Menglong Li, Zhining Wen

2025, 15(8): 101387. doi: 10.1016/j.jpha.2025.101387

Abstract(398) HTML Full Text PDF(11)

Abstract:
Accurate prediction of drug-induced adverse drug reactions (ADRs) is crucial for drug safety evaluation, as it directly impacts public health and safety. While various models have shown promising results in predicting ADRs, their accuracy still needs improvement. Additionally, many existing models often lack interpretability when linking molecular structures to specific ADRs and frequently rely on manually selected molecular fingerprints, which can introduce bias. To address these challenges, we propose ToxBERT, an efficient transformer encoder model that leverages attention and masking mechanisms for simplified molecular input line entry system (SMILES) representations. Our results demonstrate that ToxBERT achieved area under the receiver operating characteristic curve (AUROC) scores of 0.839, 0.759, and 0.664 for predicting drug-induced QT prolongation (DIQT), rhabdomyolysis, and liver injury, respectively, outperforming previous studies. Furthermore, ToxBERT can identify drug substructures that are closely associated with specific ADRs. These findings indicate that ToxBERT is not only a valuable tool for understanding the mechanisms underlying specific drug-induced ADRs but also for mitigating potential ADRs in the drug discovery pipeline.

HyPepTox-Fuse: An interpretable hybrid framework for accurate peptide toxicity prediction fusing protein language model-based embeddings with conventional descriptors

Duong Thanh Tran, Nhat Truong Pham, Nguyen Doan Hieu Nguyen, Leyi Wei, Balachandran Manavalan

2025, 15(8): 101410. doi: 10.1016/j.jpha.2025.101410

Abstract(524) HTML Full Text PDF(5)

Abstract:
Peptide-based therapeutics hold great promise for the treatment of various diseases; however, their clinical application is often hindered by toxicity challenges. The accurate prediction of peptide toxicity is crucial for designing safe peptide-based therapeutics. While traditional experimental approaches are time-consuming and expensive, computational methods have emerged as viable alternatives, including similarity-based and machine learning (ML)-/deep learning (DL)-based methods. However, existing methods often struggle with robustness and generalizability. To address these challenges, we propose HyPepTox-Fuse, a novel framework that fuses protein language model (PLM)-based embeddings with conventional descriptors. HyPepTox-Fuse integrates ensemble PLM-based embeddings to achieve richer peptide representations by leveraging a cross-modal multi-head attention mechanism and Transformer architecture. A robust feature ranking and selection pipeline further refines conventional descriptors, thus enhancing prediction performance. Our framework outperforms state-of-the-art methods in cross-validation and independent evaluations, offering a scalable and reliable tool for peptide toxicity prediction. Moreover, we conducted a case study to validate the robustness and generalizability of HyPepTox-Fuse, highlighting its effectiveness in enhancing model performance. Furthermore, the HyPepTox-Fuse server is freely accessible at https://balalab-skku.org/HyPepTox-Fuse/ and the source code is publicly available at https://github.com/cbbl-skku-org/HyPepTox-Fuse/. The study thus presents an intuitive platform for predicting peptide toxicity and supports reproducibility through openly available datasets.

The integration of machine learning into traditional Chinese medicine

Yanfeng Hong, Sisi Zhu, Yuhong Liu, Chao Tian, Hongquan Xu, Gongxing Chen, Lin Tao, Tian Xie

2025, 15(8): 101157. doi: 10.1016/j.jpha.2024.101157

Abstract(1299) HTML Full Text PDF(56)

Abstract:
Traditional Chinese medicine (TCM) is an ancient medical system distinctive and effective in treating cancer, depression, coronavirus disease 2019 (COVID-19), and other diseases. However, the relatively abstract diagnostic methods of TCM lack objective measurement, and the complex mechanisms of action are difficult to comprehend, which hinders the application and internationalization of TCM. Recently, while breakthroughs have been made in utilizing methods such as network pharmacology and virtual screening for TCM research, the rise of machine learning (ML) has significantly enhanced their integration with TCM. This article introduces representative methodological cases in quality control, mechanism research, diagnosis, and treatment processes of TCM, revealing the potential applications of ML technology in TCM. Furthermore, the challenges faced by ML in TCM applications are summarized, and future directions are discussed.

The future of pharmaceuticals: Artificial intelligence in drug discovery and development

Chen Fu, Qiuchen Chen

2025, 15(8): 101248. doi: 10.1016/j.jpha.2025.101248

Abstract(2871) HTML Full Text PDF(68)

Abstract:
Artificial intelligence (AI) is revolutionizing traditional drug discovery and development models by seamlessly integrating data, computational power, and algorithms. This synergy enhances the efficiency, accuracy, and success rates of drug research, shortens development timelines, and reduces costs. Coupled with machine learning (ML) and deep learning (DL), AI has demonstrated significant advancements across various domains, including drug characterization, target discovery and validation, small molecule drug design, and the acceleration of clinical trials. Through molecular generation techniques, AI facilitates the creation of novel drug molecules, predicting their properties and activities, while virtual screening (VS) optimizes drug candidates. Additionally, AI enhances clinical trial efficiency by predicting outcomes, designing trials, and enabling drug repositioning. However, AI's application in drug development faces challenges, including the need for robust data-sharing mechanisms and the establishment of more comprehensive intellectual property protections for algorithms. AI-driven pharmaceutical companies must also integrate biological sciences and algorithms effectively, ensuring the successful fusion of wet and dry laboratory experiments. Despite these challenges, the potential of AI in drug development remains undeniable. As AI technology evolves and these barriers are addressed, AI-driven therapeutics are poised for a broader and more impactful future in the pharmaceutical industry.

Artificial intelligence and computational methods in human metabolism research: A comprehensive survey

Manzhan Zhang, Yuxin Wan, Jing Wang, Shiliang Li, Honglin Li

2025, 15(8): 101437. doi: 10.1016/j.jpha.2025.101437

Abstract(442) HTML Full Text PDF(18)

Abstract:
Understanding the metabolism of endogenous and exogenous substances in the human body is essential for elucidating disease mechanisms and evaluating the safety and efficacy of drug candidates during the drug development process. Recent advancements in artificial intelligence (AI), particularly in machine learning (ML) and deep learning (DL) techniques, have introduced innovative approaches to metabolism research, enabling more accurate predictions and insights. This paper emphasizes computational and AI-driven methodologies, highlighting how ML enhances predictive modeling for human metabolism at the molecular level and facilitates integration into genome-scale metabolic models (GEMs) at the omics level. Challenges still remain, including data heterogeneity and model interpretability. This work aims to provide valuable insights and references for researchers in drug discovery and development, ultimately contributing to the advancement of precision medicine.

Spatial metabolomics combined with machine learning in colon cancer diagnosis research

Ling Weng, Huanhuan Wang, Chunxiang Zhai, Qi Wang, Yanyan Guo, Ziyi Zhong, Chenying Ma, Jing Wang

2025, 15(8): 101367. doi: 10.1016/j.jpha.2025.101367

Abstract(421) HTML Full Text PDF(8)

Abstract:

Artificial intelligence empowering the full spectrum of drug discovery

Tingting Fu, Kuo Zhang, Tingjun Hou, Caisheng Wu, Feng Zhu

2025, 15(8): 101438. doi: 10.1016/j.jpha.2025.101438

Abstract(357) HTML Full Text PDF(47)

Abstract:

2025 Vol. 15, No. 8