Volume 15 Issue 8
Sep.  2025
Turn off MathJax
Article Contents
Zihan Zhou, Yang Yu, Chengji Yang, Leyan Cao, Shaoying Zhang, Junnan Li, Yingnan Zhang, Huayun Han, Guoliang Shi, Qiansen Zhang, Juwen Shen, Huaiyu Yang. GPT2-ICC: A data-driven approach for accurate ion channel identification using pre-trained large language models[J]. Journal of Pharmaceutical Analysis, 2025, 15(8): 101302. doi: 10.1016/j.jpha.2025.101302
Citation: Zihan Zhou, Yang Yu, Chengji Yang, Leyan Cao, Shaoying Zhang, Junnan Li, Yingnan Zhang, Huayun Han, Guoliang Shi, Qiansen Zhang, Juwen Shen, Huaiyu Yang. GPT2-ICC: A data-driven approach for accurate ion channel identification using pre-trained large language models[J]. Journal of Pharmaceutical Analysis, 2025, 15(8): 101302. doi: 10.1016/j.jpha.2025.101302

GPT2-ICC: A data-driven approach for accurate ion channel identification using pre-trained large language models

doi: 10.1016/j.jpha.2025.101302
Funds:

This work is funded by grants from the National Key Research and Development Program of China (Grant Nos.: 2022YFE0205600 and 2022YFC3400504), the National Natural Science Foundation of China (Grant Nos.: 82373792 and 82273857), the Fundamental Research Funds for the Central Universities, China, and the East China Normal University Medicine and Health Joint Fund, China (Grant No.: 2022JKXYD07001). We are also thankful for the support of the ECNU Multifunctional Platform for Innovation (001).

  • Received Date: Oct. 30, 2024
  • Accepted Date: Apr. 04, 2025
  • Rev Recd Date: Mar. 10, 2025
  • Publish Date: Apr. 09, 2025
  • Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces. Here we have developed a deep learning algorithm, GPT2 Ion Channel Classifier (GPT2-ICC), which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins. GPT2-ICC integrates representation learning with a large language model (LLM)-based classifier, enabling highly accurate identification of potential ion channels. Several potential ion channels were predicated from the unannotated human proteome, further demonstrating GPT2-ICC’s generalization ability. This study marks a significant advancement in artificial-intelligence-driven ion channel research, highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data. Moreover, it provides a valuable computational tool for uncovering previously uncharacterized ion channels.
  • loading
  • [1]
    B. Hille, C.M. Armstrong, R. MacKinnon, Ion channels: From idea to reality, Nat. Med. 5 (1999) 1105-1109.
    [2]
    H. Wulff, P. Christophersen, Recent developments in ion channel pharmacology, Channels (Austin) 9 (2015), 335.
    [3]
    M.L. Garcia, G.J. Kaczorowski, Ion channels as therapeutic drug targets. D.J. Abraham, Michael Myers, Burger’s Medicinal Chemistry and Drug Discovery, eighth ed., Wiely, 2021, pp. 1-28.
    [4]
    S.K. Bagal, A.D. Brown, P.J. Cox, et al., Ion channels as therapeutic targets: A drug discovery perspective, J. Med. Chem. 56 (2013) 593-624.
    [5]
    J. Huang, X. Pan, N. Yan, Structural biology and molecular pharmacology of voltage-gated ion channels, Nat. Rev. Mol. Cell Biol. 25 (2024) 904-925.
    [6]
    D.A. Doyle, J. Morais Cabral, R.A. Pfuetzner, et al., The structure of the potassium channel: Molecular basis of K+ conduction and selectivity, Science 280 (1998) 69-77.
    [7]
    UniProt Consortium, UniProt: The universal protein knowledgebase in 2023, Nucleic Acids Res. 51 (2023) D523-D531.
    [8]
    C. Chen, C. Cang, S. Fenske, et al., Patch-clamp technique to characterize ion channels in enlarged individual endolysosomes, Nat. Protoc. 12 (2017) 1639-1658.
    [9]
    K. Toth, Diversity of ion channels, J. Physiol. 599 (2021) 2603-2604.
    [10]
    H. Ishikawa, G.N. Barber, STING is an endoplasmic reticulum adaptor that facilitates innate immune signalling, Nature 455 (2008) 674-678.
    [11]
    X. Gui, H. Yang, T. Li, et al., Autophagy induction via STING trafficking is a primordial function of the cGAS pathway, Nature 567 (2019) 262-266.
    [12]
    M.M. Gaidt, T.S. Ebert, D. Chauhan, et al., The DNA inflammasome in human myeloid cells is initiated by a STING-cell death program upstream of NLRP3, Cell 171 (2017) 1110-1124.e18.
    [13]
    B. Liu, R.J. Carlson, I.S. Pires, et al., Human STING is a proton channel, Science 381 (2023) 508-514.
    [14]
    S.W. Taju, Y.Y. Ou, DeepIon: Deep learning approach for classifying ion transporters and ion channels from membrane proteins, J. Comput. Chem. 40 (2019) 1521-1529.
    [15]
    H. Ghazikhani, G. Butler, Exploiting protein language models for the precise classification of ion channels and ion transporters, Proteins 92 (2024) 998-1055.
    [16]
    Y.-W. Zhao, Z.-D. Su, W. Yang, et al., IonchanPred 2.0: A tool to predict ion channels and their types, Int. J. Mol. Sci. 18 (2017), 1838.
    [17]
    K. Han, M. Wang, L. Zhang, et al., Predicting ion channels genes and their types with machine learning techniques, Front. Genet. 10 (2019), 399.
    [18]
    E. Asgari, M.R.K. Mofrad, Continuous distributed representation of biological sequences for deep proteomics and genomics, PLoS One 10 (2015), e0141287.
    [19]
    S. Saha, J. Zack, B. Singh, et al., VGIchan: Prediction and classification of voltage-gated ion channels, Genomics Proteomics Bioinformmatics 4 (2006) 253-258.
    [20]
    W.-X. Liu, E.-Z. Deng, W. Chen, et al., Identifying the subfamilies of voltage-gated potassium channels using feature selection technique, Int. J. Mol. Sci. 15 (2014) 12940-12951.
    [21]
    J. Gao, W. Cui, Y. Sheng, et al., PSIONplus: Accurate sequence-based predictor of ion channels and their types, PLoS One 11 (2016), e0152964.
    [22]
    Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2013) 1798-1828.
    [23]
    A.J. Riesselman, J.B. Ingraham, D.S. Marks, Deep generative models of genetic variation capture the effects of mutations, Nat. Methods 15 (2018) 816-822.
    [24]
    E.C. Alley, G. Khimulya, S. Biswas, et al., Unified rational protein engineering with sequence-based deep representation learning, Nat. Methods 16 (2019) 1315-1322.
    [25]
    J.E. Shin, A.J. Riesselman, A.W. Kollasch, et al., Protein design and variant prediction using autoregressive generative models, Nat. Commun. 12 (2021), 2403.
    [26]
    M. Heinzinger, A. Elnaggar, Y. Wang, et al., Modeling aspects of the language of life through transfer-learning protein sequences, BMC Bioinformatics 20 (2019), 723.
    [27]
    V. Gligorijevic, P.D. Renfrew, T. Kosciolek, et al., Structure-based protein function prediction using graph convolutional networks, Nat. Commun. 12 (2021), 3168.
    [28]
    Y. Hwang, A.L. Cornman, E.H. Kellogg, et al., Genomic language model predicts protein co-regulation and function, Nat. Commun. 15 (2024), 2880.
    [29]
    Y.J. Jang, Q.-Q. Qin, S.-Y. Huang, et al., Accurate prediction of protein function using statistics-informed graph networks, Nat. Commun. 15 (2024), 6601.
    [30]
    Y. Song, Q. Yuan, S. Chen, et al., Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures, Nat. Commun. 15 (2024), 8180.
    [31]
    Y. Zhang, B. Kang, B. Hooi, et al., Deep long-tailed learning: A survey, IEEE Trans. Pattern Anal. Mach. Intell. 45 (2023) 10795-10816.
    [32]
    T. Zhou, P. Niu, X. Wang, et al. One Fits All: Power General Time Series Analysis by Pretrained LM, arXiv. 2023. https://doi.org/10.48550/arXiv.2302.11939.
    [33]
    H. Yang, Y. Zhang, J. Xu, et al., Unveiling the generalization power of fine-tuned large language models, arXiv. 2023. https://doi.org/10.48550/arXiv.2302.11939.
    [34]
    M. Steinegger, J. Soding, MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets, Nat. Biotechnol. 35 (2017) 1026-1028.
    [35]
    Z Lin, H. Akin, R. Rao, et al., Evolutionary-scale prediction of atomic-level protein structure with a language model, Science 379 (2023) 1123-1130.
    [36]
    A.L. Mitchell, A. Almeida, M. Beracochea, et al., MGnify: The microbiome analysis resource in 2020, Nucleic Acids Res. 48 (2020) D570-D578.
    [37]
    L. Liu, H. Jiang, P. He, et al., On the variance of the adaptive learning rate and beyond, arXiv. 2019. https://arxiv.org/abs/1908.03265.
    [38]
    A. Radford, J. Wu, R. Child, et al., Language models are unsupervised multitask learners, https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf. (Accessed February 2019).
    [39]
    T. Kim, J. Kim, Y. Tae, et al. Reversible instance normalization for accurate time-series forecasting against distribution shift, April 24-29, online, 2022.
    [40]
    Vaswani A., N. Shazeer, N. Parmar, et al., Attention is all you need, arXiv. 2017. https://doi.org/10.48550/arXiv.1706.03762.
    [41]
    H. Touvron, L. Martin, K.R. Stone, et al., Llama 2: Open foundation and fine-tuned chat models, arXiv. 2023. https://arxiv.org/abs/2307.09288.
    [42]
    J. Devlin, M. Chang, K. Lee, et al., BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv. 2019. https://arxiv.org/abs/1810.04805.
    [43]
    Y. LeCun, B. Boser, J.S. Denker, et al., Backpropagation applied to handwritten zip code recognition, Neural Comput. 1 (1989) 541-551.
    [44]
    M.A. Hearst, S.T. Dumais, E. Osuna, et al., Support vector machines, IEEE Intell. Syst. Appl. 13 (1998) 18-28.
    [45]
    F. Pedregosa, G. Varoquaux, A. Gramfort, et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.
    [46]
    T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory 13 (1967) 21-27.
    [47]
    S.F. Altschul, T.L. Madden, A.A. Schaffer, et al., Gapped BLAST and PSI-BLAST: A new generation of protein database search programs, Nucleic Acids Res. 25 (1997) 3389-3402.
    [48]
    X. Hou, Y. He, P. Fang, et al., Using artificial intelligence to document the hidden RNA virosphere, Cell 187 (2024) 6929-6942.e16.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(1)

    Article Metrics

    Article views (256) PDF downloads(2) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return