Aligning Multiple Protein Sequences by Hybrid Clonal Selection Algorithm with Insert-Remove-Gaps and BlockShuffling Operators

  • V. Cutello
  • D. Lee
  • G. Nicosia
  • M. Pavone
  • I. Prizzi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4163)


Multiple sequence alignment (MSA) is one of the most important tasks in biological sequence analysis. This paper will primarily focus on on protein alignments, but most of the discussion and methodology also applies to DNA alignments. A novel hybrid clonal selection algorihm, called an aligner, is presented. It searches for a set of alignments amongst the population of candidate alignments by optimizing the classical weighted sum of pairs objective function. Benchmarks from BaliBASE library (v.1.0 and v.2.0) are used to validate the algorithm. Experimental results of BaliBASE v.1.0 benchmarks show that the proposed algorithm is superior to PRRP, ClustalX, SAGA, DIALIGN, PIMA, MULTIALIGN, and PILEUP8. On BaliBASE v.2.0 benchmarks the algorithm shows interesting results in terms of SP score with respect to established and leading methods, i.e. ClustalW, T-Coffee, MUSCLE, PRALINE, ProbCons, and Spem.


bioinformatics multiple sequence alignment protein sequences immune algorithms clonal selection algorithms hypermutation operator 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Eidhammer, I., Jonassen, I., Taylor, W.R.: Protein Bioinformatics. Wiley, Chichester (2004)Google Scholar
  2. 2.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (2004)Google Scholar
  3. 3.
    Thompson, J.D., Plewniak, F., Ripp, R., Thierry, J.C., Poch, O.: Towards a Reliable Objective Function for Multiple Sequence Alignments. J. Mol. Biol. 301, 937–951 (2001)CrossRefGoogle Scholar
  4. 4.
    Altschul, S.F., Lipman, D.J.: Trees stars and multiple biological sequence alignment. SIAM Journal on Applied Mathematics 49, 197–209 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Altschul, S.F., Carroll, R.J., Lipman, D.J.: Weights for data related by a tree. Journal on Molecular Biology 207, 647–653 (1989)CrossRefGoogle Scholar
  6. 6.
    Bonizzoni, P., Della Vedova, G.: The Complexity of Multiple Sequence Alignment with SP-score that is a Metric. Theoretical Computer Science 259(1), 63–79 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1, 337–348 (1994)CrossRefGoogle Scholar
  8. 8.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York (1979)zbMATHGoogle Scholar
  9. 9.
    Gupta, S.K., Kececioglu, J.D., Schaffer, A.: Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. Journal of Computational Biology 2, 459–472 (1995)CrossRefGoogle Scholar
  10. 10.
    Corpet, F.: Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research 16, 10881–10890 (1988)CrossRefGoogle Scholar
  11. 11.
    Wisconsin Package v.8; Genetics Computer Group, Madison, WI,
  12. 12.
    Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G.: The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 24, 4876–4882 (1997)CrossRefGoogle Scholar
  13. 13.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)CrossRefGoogle Scholar
  14. 14.
    Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate Multiple Sequence Alignment. Journal Molecular Biology 302, 205–217 (2000)CrossRefGoogle Scholar
  15. 15.
    Zhou, H., Zhou, Y.: SPEM: Improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21, 3615–3621 (2005)CrossRefGoogle Scholar
  16. 16.
    Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15, 330–340 (2005)CrossRefGoogle Scholar
  17. 17.
    Smith, R.F., Smith, T.F.: Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Engineering 5, 35–41 (1992)CrossRefGoogle Scholar
  18. 18.
    Carrillo, H., Lipman, D.J.: The Multiple Sequence Alignment Problem in Biology. SIAM Journal on Applied Mathematics 48, 1073–1082 (1988)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Stoye, J., Moulton, V., Dress, A.W.: DCA: an efficient implementation of the divide-and conquer approach to simultaneous multiple sequence alignment. Bioinformatics 13(6), 625–626 (1997)CrossRefGoogle Scholar
  20. 20.
    Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALIGN: Finding local similarities by multiple sequence alignment. Bioinformatics 14, 290–294 (1998)CrossRefGoogle Scholar
  21. 21.
    Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211–218 (1999)CrossRefGoogle Scholar
  22. 22.
    Gotoh, O.: Further improvement in methods of group-to-group sequence alignment with generalized profile operations. Bioinformatics 10(4), 379–387 (1994)CrossRefMathSciNetGoogle Scholar
  23. 23.
    Eddy, S.R.: Multiple alignment using hidden Markov models. In: 3rd International Conference on Intelligent Systems for Molecular Biology, vol. 3, pp. 114–120 (1995)Google Scholar
  24. 24.
    Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004)CrossRefGoogle Scholar
  25. 25.
    Notredame, C., Higgins, D.G.: SAGA: sequence alignment by genetic algorithm. Nucleic Acids Research 24, 1515–1539 (1996)CrossRefGoogle Scholar
  26. 26.
    Notredame, C.: COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998)CrossRefGoogle Scholar
  27. 27.
    Simossis, V.A., Heringa, J.: PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Research 33, 289–294 (2005)CrossRefGoogle Scholar
  28. 28.
    Shyu, C., Sheneman, L., Foster, J.A.: Multiple Sequence Alignment with Evolutionary Computation. Genetic Programming and Evolvable Machines 5, 121–144 (2004)CrossRefGoogle Scholar
  29. 29.
    Nguyen, H.D., Yoshihara, I., Yamamori, K., Yasunaga, M.: Aligning Multiple Protein Sequences by Parallel Hybrid Geneti Algorithm. Genome Informatics 13, 123–132 (2002)Google Scholar
  30. 30.
    Cutello, V., Nicosia, G.: An Immunological Approach to Combinatorial Optimization Problems. In: Garijo, F.J., Riquelme, J.-C., Toro, M. (eds.) IBERAMIA 2002. LNCS, vol. 2527, pp. 361–370. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  31. 31.
    Nicosia, G.: Immune Algorithms for Optimization and Protein Structure Prediction. Ph.D. Dissertation, Department of Mathematics and Computer Science, University of Catania, Italy (2004)Google Scholar
  32. 32.
    Cutello, V., Narzisi, G., Nicosia, G., Pavone, M.: Clonal selection algorithms: A comparative case study using effective mutation potentials. In: Jacob, C., Pilat, M.L., Bentley, P.J., Timmis, J.I. (eds.) ICARIS 2005. LNCS, vol. 3627, pp. 13–28. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  33. 33.
    Cutello, V., Nicosia, G., Pavone, M., Timmis, J.: An Immune Algorithm for Protein Structure Prediction on Lattice Models. IEEE Transaction on Evolutionary Computation (to appear)Google Scholar
  34. 34.
    Cutello, V., Nicosia, G., Pavone, M.: Exploring the capability of immune algorithms: A characterization of hypermutation operators. In: Nicosia, G., Cutello, V., Bentley, P.J., Timmis, J. (eds.) ICARIS 2004. LNCS, vol. 3239, pp. 263–276. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  35. 35.
    Taylor, W.R.: A flexible method to align a large number of sequences. J. Mol. Evol. 28, 161–169 (1988)CrossRefGoogle Scholar
  36. 36.
    Thompson, J.D., Plewniak, F., Poch, O.: BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15, 87–88 (1999)CrossRefGoogle Scholar
  37. 37.
    Bahr, A., Thompson, J.D., Thierry, J.C., Poch, O.: BAliBASE (Benchmark Alignment dataBASE): Enhancements for Repeats, Transmembrane Sequences and Circular Permuations. Nucleic Acids Research 29(1), 232–326 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • V. Cutello
    • 1
  • D. Lee
    • 2
  • G. Nicosia
    • 1
  • M. Pavone
    • 2
  • I. Prizzi
    • 3
  1. 1.Department of Mathematics and Computer ScienceUniversity of CataniaCataniaItaly
  2. 2.IBM-KAIST Bio-Computing Research Center, Department of BioSystemsKAISTDaejeonRepublic of Korea
  3. 3.Diogenes Research CenterCataniaItaly

Personalised recommendations