Adaptive multi-view learning method for enhanced drug repurposing using chemical-induced transcriptional profiles, knowledge graphs, and large language models
a. Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China;
b. The Fifth People's Hospital of Chongqing, Chongqing, 400062, China
Funds:
This work was supported by the National Natural Science Foundation of China (Grant No. 62101087), the China Postdoctoral Science Foundation (Grant No. 2021MD703942), the Chongqing Postdoctoral Research Project Special Funding (Grant No. 2021XM2016), the Science Foundation of Chongqing Municipal Commission of Education (Grant No. KJQN202100642) and the Chongqing Natural Science Foundation (cstc2021jcyj-msxmX0834).
Drug repurposing offers a promising alternative to traditional drug development and significantly reduces costs and timelines by identifying new therapeutic uses for existing drugs. However, the current approaches often rely on limited data sources and simplistic hypotheses, which restrict their ability to capture the multifaceted nature of biological systems. This study introduces adaptive multi-view learning (AMVL), a novel methodology that integrates chemical-induced transcriptional profiles (CTPs), knowledge graph (KG) embeddings, and large language model (LLM) representations, to enhance drug repurposing predictions. AMVL incorporates an innovative similarity matrix expansion strategy and leverages multi-view learning, matrix factorization, and ensemble optimization techniques to integrate heterogeneous multisource data. Comprehensive evaluations on benchmark datasets (Fdataset, Cdataset, and Ydataset) and the large-scale iDrug dataset demonstrate that AMVL outperforms state-of-the-art methods, achieving superior accuracy in predicting drug-disease associations across multiple metrics. Literature-based validation further confirmed the model's predictive capabilities, with seven out of the top ten predictions corroborated by post-2011 evidence. To promote transparency and reproducibility, all data and codes used in this study were open-sourced, providing resources for processing CTPs, KG, and LLM-based similarity calculations, along with the complete AMVL algorithm and benchmarking procedures. By unifying diverse data modalities, AMVL offers a robust and scalable solution for accelerating drug discovery, fostering advancements in translational medicine, and integrating multi-omics data. We aim to inspire further innovations in multisource data integration and support the development of more precise and efficient strategies for advancing drug discovery and translational medicine.