A Class of New Kernels Based on High-Scored Pairs of k-Peptides for SVMs and Its Application for Prediction of Protein Subcellular Localization

  • Zhengdeng Lei
  • Yang Dai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3680)


A class of new kernels has been developed for vectors derived from a coding scheme of the k-peptide composition for protein sequences. Each kernel defines the biological similarity for two mapped k-peptide coding vectors. The mapping transforms a k-peptide coding vector into a new vector based on a matrix formed by high BLOSUM scores associated with pairs of k-peptides. In conjunction with the use of support vector machines, the effectiveness of the new kernels is evaluated against the conventional coding scheme of k-peptide (k ≤ 3) for the prediction of subcellular localizations of proteins in Gram-negative bacteria. It is demonstrated that the new method outperforms all the other methods in a 5-fold cross-validation.


Protein subcellular localization BLOSUM matrix kernel support vector machine Gram-negative bacteria 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bannai, H., Tamada, Y., Maruyama, O., Nakai, K., Miyano, S.: Extensive feature detection of N-terminal protein sorting signals. Bioinformatics 18, 298–305 (2002)CrossRefGoogle Scholar
  2. 2.
    Cai, Y.D., Chou, K.C.: Predicting subcellular localization of proteins in a hybridization space. Bioinformatics 20, 1151–1156 (2003)CrossRefGoogle Scholar
  3. 3.
    Chou, K.C., Cai, Y.D.: Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem. 277, 45765–4576 (2002)Google Scholar
  4. 4.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)Google Scholar
  5. 5.
    Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300, 1005–1016 (2000)CrossRefGoogle Scholar
  6. 6.
    Emanuelsson, O.: Predicting protein subcellular localisation from amino acid sequence information. Brief. Bioinform. 3, 361–376 (2002)CrossRefGoogle Scholar
  7. 7.
    Feng, Z.P.: Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. Biopolymers 58, 491–499 (2001)CrossRefGoogle Scholar
  8. 8.
    Gardy, J.L., et al.: PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res. 31, 3613–3617 (2003)CrossRefGoogle Scholar
  9. 9.
    Gardy, J.L., et al.: PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21, 617–623 (2005)CrossRefGoogle Scholar
  10. 10.
    von Heijne, G.: Signals for protein targeting into and across membranes. Subcell. Biochem. 22, 1–19 (1994)Google Scholar
  11. 11.
    Horton, P., Nakai, K.: PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci. 24, 34–36 (1999)CrossRefGoogle Scholar
  12. 12.
    Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–728 (2001)CrossRefGoogle Scholar
  13. 13.
    Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher kernel method to detect remote protein homologies. In: Proc. of the Seventh International Conference on Intelligent Systems for Molecular Biology, pp. 149–158 (1999)Google Scholar
  14. 14.
    Joachims, T.: Making Large Scale SVM Learning Practical. Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge (1999)Google Scholar
  15. 15.
    Lei, Z., Dai, Y.: A novel approach for prediction of protein subcellular localization from sequence using Fourier analysis and support vector machines. In: Proc. of the Fourth ACM SIGKDD Workshop on Data Mining in Bioinformatics, pp. 11–17 (2004)Google Scholar
  16. 16.
    Lei, Z., Dai, Y.: A new kernel based on high-scored pairs of tri-peptides and its application in prediction of protein subcellular localization. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 903–910. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)CrossRefGoogle Scholar
  18. 18.
    Li, H., Jiang, T.: A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs. In: Proc. of the Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 262–271 (2004)Google Scholar
  19. 19.
    Lu, Z., Szafron, D., Greiner, R., Lu, P., Wishart, D.S., Poulin, B., Anvik, J., Macdonell, C., Eisner, R.: Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20, 547–556 (2004)CrossRefGoogle Scholar
  20. 20.
    Meinicke, P., Tech, M., Morgenstern, B., Merkl, R.: Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites. BMC Bioinformatics 5, 169 (2004)CrossRefGoogle Scholar
  21. 21.
    Menne, K.M.L., Hermjakob, H., Apweiler, R.: A comparison of signal sequence prediction methods using a test set of signal peptides. Bioinformatics 16, 741–742 (2000)CrossRefGoogle Scholar
  22. 22.
    Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach - A case study in intensive care monitoring. In: Proc. of the Sixteenth International Conference on Machine Learning, pp. 268–277 (1999)Google Scholar
  23. 23.
    Nair, R., Rost, B.: Sequence conserved for subcellular localization. Protein Sci. 11, 2836–2847 (2002)CrossRefGoogle Scholar
  24. 24.
    Nakai, K.: Protein sorting signals and prediction of subcellular localization. Adv. Protein. Chem. 54, 277–344 (2000)CrossRefGoogle Scholar
  25. 25.
    Nakai, K., Kanehisa, M.: Expert system for predicting protein localization sites in Gram-negative bacteria. Proteins 11, 95–110 (1991)CrossRefGoogle Scholar
  26. 26.
    Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G.: A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int. J. Neural Syst. 8, 581–599 (1997)CrossRefGoogle Scholar
  27. 27.
    Park, K., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19, 1656–1663 (2003)CrossRefGoogle Scholar
  28. 28.
    Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res. 26, 2230–2236 (1998)CrossRefGoogle Scholar
  29. 29.
    Tusnady, G.E., Simon, I.: Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol. 283, 489–506 (1998)CrossRefGoogle Scholar
  30. 30.
    Tusnady, G.E., Simon, I.: The HMMTOP transmembrane topology prediction server. Bioinformatics 17, 849–850 (2001)CrossRefGoogle Scholar
  31. 31.
    Yu, C.S., Lin, C.J., Hwang, J.K.: Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci. 13, 1402–1406 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Zhengdeng Lei
    • 1
  • Yang Dai
    • 1
  1. 1.Department of Bioengineering (MC063)University of Illinois at ChicagoChicagoUSA

Personalised recommendations