The Incompatible Desiderata of Gene Cluster Properties

  • Rose Hoberman
  • Dannie Durand
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3678)


There is widespread interest in comparative genomics in determining if historically and/or functionally related genes are spatially clustered in the genome, and whether the same sets of genes reappear in clusters in two or more genomes. We formalize and analyze the desirable properties of gene clusters and cluster definitions. Through detailed analysis of two commonly applied types of cluster, r-windows and max-gap, we investigate the extent to which a single definition can embody all of these properties simultaneously. We show that many of the most important properties are difficult to satisfy within the same definition. We also examine whether one commonly assumed property, which we call nestedness, is satisfied by the structures present in real genomic data.


Gene Cluster Greedy Algorithm Gene Order Homologous Region Cluster Property 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Murphy, W.J., Pevzner, P.A., O’Brien, S.J.: Mammalian phylogenomics comes of age. Trends Genet. 20, 631–639 (2004)CrossRefGoogle Scholar
  2. 2.
    O’Brien, S.J., Menotti-Raymond, M., Murphy, W.J., Nash, W.G., Wienberg, J., Stanyon, R., Copeland, N.G., Jenkins, N.A., Womack, J.E., Graves, J.A.M.: The promise of comparative genomics in mammals. Science 286, 458–481 (1999)CrossRefGoogle Scholar
  3. 3.
    Sankoff, D.: Rearrangements and chromosomal evolution. Curr. Opin. Genet. Dev. 13, 583–587 (2003)CrossRefGoogle Scholar
  4. 4.
    Sankoff, D., Nadeau, J.H.: Chromosome rearrangements in evolution: From gene order to genome sequence and back. PNAS 100, 11188–11189 (2003)CrossRefGoogle Scholar
  5. 5.
    Simillion, C., Vandepoele, K., de Peer, Y.V.: Recent developments in computational approaches for uncovering genomic homology. Bioessays 26, 1225–1235 (2004)CrossRefGoogle Scholar
  6. 6.
    Blanc, G., Hokamp, K., Wolfe, K.H.: A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome. Res. 13, 137–144 (2003)CrossRefGoogle Scholar
  7. 7.
    Chen, X., Su, Z., Dam, P., Palenik, B., Xu, Y., Jiang, T.: Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. Nucleic Acids Res. 32, 2147–2157 (2004)CrossRefGoogle Scholar
  8. 8.
    Lawrence, J., Roth, J.R.: Selfish operons: horizontal transfer drive the evolution of gene clusters. Genetics 143, 1843–1860 (1996)Google Scholar
  9. 9.
    Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. U. S. A. 96, 2896–2901 (1999)CrossRefGoogle Scholar
  10. 10.
    Tamames, J.: Evolution of gene order conservation in prokaryotes. Genome. Biol. 6, 0020.1–0020.11 (2001)Google Scholar
  11. 11.
    Wolf, Y.I., Rogozin, I.B., Kondrashov, A.S., Koonin, E.V.: Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome. Res. 11, 356–372 (2001)CrossRefGoogle Scholar
  12. 12.
    Endo, T., Imanishi, T., Gojobori, T., Inoko, H.: Evolutionary significance of intra-genome duplications on human chromosomes. Gene. 205, 19–27 (1997)CrossRefGoogle Scholar
  13. 13.
    Smith, N.G.C., Knight, R., Hurst, L.D.: Vertebrate genome evolution: a slow shuffle or a big bang. BioEssays 21, 697–703 (1999)CrossRefGoogle Scholar
  14. 14.
    Trachtulec, Z., Forejt, J.: Synteny of orthologous genes conserved in mammals, snake, fly, nematode, and fission yeast. Mamm. Genome. 3, 227–231 (2001)CrossRefGoogle Scholar
  15. 15.
    Friedman, R., Hughes, A.L.: Gene duplication and the structure of eukaryotic genomes. Genome. Res. 11, 373–381 (2001)CrossRefGoogle Scholar
  16. 16.
    Luc, N., Risler, J., Bergeron, A., Raffinot, M.: Gene teams: a new formalization of gene clusters for comparative genomics. Comput. Biol. Chem. 27, 59–67 (2003)CrossRefGoogle Scholar
  17. 17.
    McLysaght, A., Hokamp, K., Wolfe, K.H.: Extensive genomic duplication during early chordate evolution. Nat. Genet. 31, 200–204 (2002)CrossRefGoogle Scholar
  18. 18.
    Cavalcanti, A.R.O., Ferreira, R., Gu, Z., Li, W.H.: Patterns of gene duplication in Saccharomyces cerevisiae and Caenorhabditis elegans. J. Mol. Evol. 56, 28–37 (2003)CrossRefGoogle Scholar
  19. 19.
    Durand, D., Sankoff, D.: Tests for gene clustering. Journal of Computational Biology, 453–482 (2003)Google Scholar
  20. 20.
    Bergeron, A., Corteel, S., Raffinot, M.: The algorithmic of gene teams. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 464–476. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  21. 21.
    Hoberman, R., Sankoff, D., Durand, D.: The statistical significance of max-gap clusters. In: Lagergren, J. (ed.) RECOMB-WS 2004. LNCS (LNBI), vol. 3388. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  22. 22.
    Didier, G.: Common intervals of two sequences. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 17–24. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  23. 23.
    Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 254–265. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  24. 24.
    Uno, T., Yagiura, M.: Fast algorithms to enumerate all common intervals of two permutations. Algorithmica 26, 290–309 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Calabrese, P.P., Chakravarty, S., Vision, T.J.: Fast identification and statistical evaluation of segmental homologies in comparative maps. ISMB (Supplement of Bioinformatics), 74–80 (2003)Google Scholar
  26. 26.
    Sankoff, D., Ferretti, V., Nadeau, J.H.: Conserved segment identification. Journal of Computational Biology 4, 559–565 (1997)CrossRefGoogle Scholar
  27. 27.
    Pevzner, P., Tesler, G.: Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. Genome. Res. 13, 37–45 (2003)CrossRefGoogle Scholar
  28. 28.
    Haas, B.J., Delcher, A.L., Wortman, J.R., Salzberg, S.L.: DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20, 3643–3646 (2004)CrossRefGoogle Scholar
  29. 29.
    Vision, T.J., Brown, D.G., Tanksley, S.D.: The origins of genomic duplications in Arabidopsis. Science 290, 2114–2117 (2000)CrossRefGoogle Scholar
  30. 30.
    Bansal, A.K.: An automated comparative analysis of 17 complete microbial genomes. Bioinformatics 15, 900–908 (1999), Google Scholar
  31. 31.
    Cannon, S.B., Kozik, A., Chan, B., Michelmore, R., Young, N.D.: DiagHunter and GenoPix2D: programs for genomic comparisons, large-scale homology discovery and visualization. Genome. Biol. 4, R68 (2003)Google Scholar
  32. 32.
    Hampson, S., McLysaght, A., Gaut, B., Baldi, P.: LineUp: statistical detection of chromosomal homology with application to plant comparative genomics. Genome. Res. 13, 999–1010 (2003)Google Scholar
  33. 33.
    Hampson, S.E., Gaut, B.S., Baldi, P.: Statistical detection of chromosomal homology using shared-gene density alone. Bioinformatics 21, 1339–1348 (2005)Google Scholar
  34. 34.
    Vandepoele, K., Saeys, Y., Simillion, C., Raes, J., Peer, Y.V.D.: The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. Genome. Res. 12, 1792–1801 (2002)Google Scholar
  35. 35.
    Raes, J., Vandepoele, K., Simillion, C., Saeys, Y., de Peer, Y.V.: Investigating ancient duplication events in the Arabidopsis genome. J. Struct. Funct. Genomics 3, 117–129 (2003)Google Scholar
  36. 36.
    Graur, D., Martin, W.: Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. Trends Genet. 20, 80–86 (2004)Google Scholar
  37. 37.
    Nei, M., Kumar, S.: Molecular Evolution and Phylogenetics. Oxford University Press, Oxford (2000)Google Scholar
  38. 38.
    Zhang, L., Vision, T.J., Gaut, B.S.: Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol. Biol. Evol. 19, 1464–1473 (2002)Google Scholar
  39. 39.
    Hokamp, K.: A Bioinformatics Approach to (Intra-)Genome Comparisons. PhD thesis, University of Dublin, Trinity College (2001)Google Scholar
  40. 40.
    Bourque, G., Zdobnov, E., Bork, P., Pevzner, P., Telser, G.: Genome rearrangements in human, mouse, rat and chicken. Genome. Research (2004)Google Scholar
  41. 41.
    Simillion, C., Vandepoele, K., Montagu, M.V., Zabeau, M., de Peer, Y.V.: The hidden duplication past of Arabidopsis thaliana. PNAS 99, 13627–13632 (2002)CrossRefGoogle Scholar
  42. 42.
    O’Brien, K.P., Remm, M., Sonnhammer, E.L.L.: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33, D476–D480 (2005) Version 4.0, downloaded (May 2005)CrossRefGoogle Scholar
  43. 43.
    Lynch, M., Conery, J.S.: The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000)CrossRefGoogle Scholar
  44. 44.
    Trinh, P., McLysaght, A., Sankoff, D.: Genomic features in the breakpoint regions between syntenic blocks. Bioinformatics 20(suppl. 1), I318–I325 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Rose Hoberman
    • 1
  • Dannie Durand
    • 2
  1. 1.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA
  2. 2.Departments of Biological Sciences and Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations