a. Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China;
b. College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
Funds:
This work was financially supported by National Natural Science Foundation of China (Grant No.: 82404511), and Postdoctoral Fellowship Program of CPSF (Grant No.: GZC20232345).
Drug development encompasses multiple processes, wherein protein subcellular localization is essential. It promotes target identification, treatment development, and the design of drug delivery systems. In this research, a deep learning framework called LocPro is presented for predicting protein subcellular localization. Specifically, LocPro is unique in ( a ) combining protein representations from the pre-trained large language model (LLM) ESM2 and the expertdriven tool PROFEAT, ( b ) implementing a hybrid deep neural network architecture that integrates CNN, FC, and BiLSTM blocks, and ( c ) developing a multi-label framework for predicting protein subcellular localization at multiple granularity levels. Additionally, a dataset was curated and divided using a homology-based strategy for training and validation. Comparative analyses show that LocPro outperforms existing methods in sequence-based multilabel protein subcellular localization prediction. The practical utility of this framework is further demonstrated through case studies on drug target subcellular localization. All in all, LocPro serves as a valuable complement to existing protein localization prediction tools.