| Citation: | Zihan Zhou, Yang Yu, Chengji Yang, Leyan Cao, Shaoying Zhang, Junnan Li, Yingnan Zhang, Huayun Han, Guoliang Shi, Qiansen Zhang, Juwen Shen, Huaiyu Yang. GPT2-ICC: A data-driven approach for accurate ion channel identification using pre-trained large language models[J]. Journal of Pharmaceutical Analysis, 2025, 15(8): 101302. doi: 10.1016/j.jpha.2025.101302 |
| [1] |
B. Hille, C.M. Armstrong, R. MacKinnon, Ion channels: From idea to reality, Nat. Med. 5 (1999) 1105-1109.
|
| [2] |
H. Wulff, P. Christophersen, Recent developments in ion channel pharmacology, Channels (Austin) 9 (2015), 335.
|
| [3] |
M.L. Garcia, G.J. Kaczorowski, Ion channels as therapeutic drug targets. D.J. Abraham, Michael Myers, Burger’s Medicinal Chemistry and Drug Discovery, eighth ed., Wiely, 2021, pp. 1-28.
|
| [4] |
S.K. Bagal, A.D. Brown, P.J. Cox, et al., Ion channels as therapeutic targets: A drug discovery perspective, J. Med. Chem. 56 (2013) 593-624.
|
| [5] |
J. Huang, X. Pan, N. Yan, Structural biology and molecular pharmacology of voltage-gated ion channels, Nat. Rev. Mol. Cell Biol. 25 (2024) 904-925.
|
| [6] |
D.A. Doyle, J. Morais Cabral, R.A. Pfuetzner, et al., The structure of the potassium channel: Molecular basis of K+ conduction and selectivity, Science 280 (1998) 69-77.
|
| [7] |
UniProt Consortium, UniProt: The universal protein knowledgebase in 2023, Nucleic Acids Res. 51 (2023) D523-D531.
|
| [8] |
C. Chen, C. Cang, S. Fenske, et al., Patch-clamp technique to characterize ion channels in enlarged individual endolysosomes, Nat. Protoc. 12 (2017) 1639-1658.
|
| [9] |
K. Toth, Diversity of ion channels, J. Physiol. 599 (2021) 2603-2604.
|
| [10] |
H. Ishikawa, G.N. Barber, STING is an endoplasmic reticulum adaptor that facilitates innate immune signalling, Nature 455 (2008) 674-678.
|
| [11] |
X. Gui, H. Yang, T. Li, et al., Autophagy induction via STING trafficking is a primordial function of the cGAS pathway, Nature 567 (2019) 262-266.
|
| [12] |
M.M. Gaidt, T.S. Ebert, D. Chauhan, et al., The DNA inflammasome in human myeloid cells is initiated by a STING-cell death program upstream of NLRP3, Cell 171 (2017) 1110-1124.e18.
|
| [13] |
B. Liu, R.J. Carlson, I.S. Pires, et al., Human STING is a proton channel, Science 381 (2023) 508-514.
|
| [14] |
S.W. Taju, Y.Y. Ou, DeepIon: Deep learning approach for classifying ion transporters and ion channels from membrane proteins, J. Comput. Chem. 40 (2019) 1521-1529.
|
| [15] |
H. Ghazikhani, G. Butler, Exploiting protein language models for the precise classification of ion channels and ion transporters, Proteins 92 (2024) 998-1055.
|
| [16] |
Y.-W. Zhao, Z.-D. Su, W. Yang, et al., IonchanPred 2.0: A tool to predict ion channels and their types, Int. J. Mol. Sci. 18 (2017), 1838.
|
| [17] |
K. Han, M. Wang, L. Zhang, et al., Predicting ion channels genes and their types with machine learning techniques, Front. Genet. 10 (2019), 399.
|
| [18] |
E. Asgari, M.R.K. Mofrad, Continuous distributed representation of biological sequences for deep proteomics and genomics, PLoS One 10 (2015), e0141287.
|
| [19] |
S. Saha, J. Zack, B. Singh, et al., VGIchan: Prediction and classification of voltage-gated ion channels, Genomics Proteomics Bioinformmatics 4 (2006) 253-258.
|
| [20] |
W.-X. Liu, E.-Z. Deng, W. Chen, et al., Identifying the subfamilies of voltage-gated potassium channels using feature selection technique, Int. J. Mol. Sci. 15 (2014) 12940-12951.
|
| [21] |
J. Gao, W. Cui, Y. Sheng, et al., PSIONplus: Accurate sequence-based predictor of ion channels and their types, PLoS One 11 (2016), e0152964.
|
| [22] |
Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2013) 1798-1828.
|
| [23] |
A.J. Riesselman, J.B. Ingraham, D.S. Marks, Deep generative models of genetic variation capture the effects of mutations, Nat. Methods 15 (2018) 816-822.
|
| [24] |
E.C. Alley, G. Khimulya, S. Biswas, et al., Unified rational protein engineering with sequence-based deep representation learning, Nat. Methods 16 (2019) 1315-1322.
|
| [25] |
J.E. Shin, A.J. Riesselman, A.W. Kollasch, et al., Protein design and variant prediction using autoregressive generative models, Nat. Commun. 12 (2021), 2403.
|
| [26] |
M. Heinzinger, A. Elnaggar, Y. Wang, et al., Modeling aspects of the language of life through transfer-learning protein sequences, BMC Bioinformatics 20 (2019), 723.
|
| [27] |
V. Gligorijevic, P.D. Renfrew, T. Kosciolek, et al., Structure-based protein function prediction using graph convolutional networks, Nat. Commun. 12 (2021), 3168.
|
| [28] |
Y. Hwang, A.L. Cornman, E.H. Kellogg, et al., Genomic language model predicts protein co-regulation and function, Nat. Commun. 15 (2024), 2880.
|
| [29] |
Y.J. Jang, Q.-Q. Qin, S.-Y. Huang, et al., Accurate prediction of protein function using statistics-informed graph networks, Nat. Commun. 15 (2024), 6601.
|
| [30] |
Y. Song, Q. Yuan, S. Chen, et al., Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures, Nat. Commun. 15 (2024), 8180.
|
| [31] |
Y. Zhang, B. Kang, B. Hooi, et al., Deep long-tailed learning: A survey, IEEE Trans. Pattern Anal. Mach. Intell. 45 (2023) 10795-10816.
|
| [32] |
T. Zhou, P. Niu, X. Wang, et al. One Fits All: Power General Time Series Analysis by Pretrained LM, arXiv. 2023. https://doi.org/10.48550/arXiv.2302.11939.
|
| [33] |
H. Yang, Y. Zhang, J. Xu, et al., Unveiling the generalization power of fine-tuned large language models, arXiv. 2023. https://doi.org/10.48550/arXiv.2302.11939.
|
| [34] |
M. Steinegger, J. Soding, MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets, Nat. Biotechnol. 35 (2017) 1026-1028.
|
| [35] |
Z Lin, H. Akin, R. Rao, et al., Evolutionary-scale prediction of atomic-level protein structure with a language model, Science 379 (2023) 1123-1130.
|
| [36] |
A.L. Mitchell, A. Almeida, M. Beracochea, et al., MGnify: The microbiome analysis resource in 2020, Nucleic Acids Res. 48 (2020) D570-D578.
|
| [37] |
L. Liu, H. Jiang, P. He, et al., On the variance of the adaptive learning rate and beyond, arXiv. 2019. https://arxiv.org/abs/1908.03265.
|
| [38] |
A. Radford, J. Wu, R. Child, et al., Language models are unsupervised multitask learners, https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf. (Accessed February 2019).
|
| [39] |
T. Kim, J. Kim, Y. Tae, et al. Reversible instance normalization for accurate time-series forecasting against distribution shift, April 24-29, online, 2022.
|
| [40] |
Vaswani A., N. Shazeer, N. Parmar, et al., Attention is all you need, arXiv. 2017. https://doi.org/10.48550/arXiv.1706.03762.
|
| [41] |
H. Touvron, L. Martin, K.R. Stone, et al., Llama 2: Open foundation and fine-tuned chat models, arXiv. 2023. https://arxiv.org/abs/2307.09288.
|
| [42] |
J. Devlin, M. Chang, K. Lee, et al., BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv. 2019. https://arxiv.org/abs/1810.04805.
|
| [43] |
Y. LeCun, B. Boser, J.S. Denker, et al., Backpropagation applied to handwritten zip code recognition, Neural Comput. 1 (1989) 541-551.
|
| [44] |
M.A. Hearst, S.T. Dumais, E. Osuna, et al., Support vector machines, IEEE Intell. Syst. Appl. 13 (1998) 18-28.
|
| [45] |
F. Pedregosa, G. Varoquaux, A. Gramfort, et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.
|
| [46] |
T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory 13 (1967) 21-27.
|
| [47] |
S.F. Altschul, T.L. Madden, A.A. Schaffer, et al., Gapped BLAST and PSI-BLAST: A new generation of protein database search programs, Nucleic Acids Res. 25 (1997) 3389-3402.
|
| [48] |
X. Hou, Y. He, P. Fang, et al., Using artificial intelligence to document the hidden RNA virosphere, Cell 187 (2024) 6929-6942.e16.
|