Individual Gene Cluster Statistics in Noisy Maps

  • Narayanan Raghupathy
  • Dannie Durand
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3678)


Identification of homologous chromosomal regions is important for understanding evolutionary processes that shape genome evolution, such as genome rearrangements and large scale duplication events. If these chromosomal regions have diverged significantly, statistical tests to determine whether observed similarities in gene content are due to history or chance are imperative. Currently available methods are typically designed for genomic data and are appropriate for whole genome analyses. Statistical methods for estimating significance when a single pair of regions is under consideration are needed. We present a new statistical method, based on generating functions, for estimating the significance of orthologous gene clusters under the null hypothesis of random gene order. Our statistics is suitable for noisy comparative maps, in which a one-to-one homology mapping cannot be established. It is also designed for testing the significance of an individual gene cluster in isolation, in situations where whole genome data is not available. We implement our statistics in Mathematica and demonstrate its utility by applying it to the MHC homologous regions in human and fly.


Gene Family Gene Cluster Gene Order Genome Duplication Window Packing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Eichler, E.E., Sankoff, D.: Structural dynamics of eukaryotic chromosome evolution. Science 301, 793–797 (2003)CrossRefGoogle Scholar
  2. 2.
    Sankoff, D.: Rearrangements and chromosomal evolution. Curr. Opin. Genet. Dev. 13, 583–587 (2003)CrossRefGoogle Scholar
  3. 3.
    Sankoff, D., Nadeau, J.H.: Chromosome rearrangements in evolution: From gene order to genome sequence and back. PNAS 100, 11188–11189 (2003)CrossRefGoogle Scholar
  4. 4.
    Hurst, L.D., Pal, C., Lercher, M.J.: The evolutionary dynamics of eukaryotic gene order. Nat. Rev. Genet. 5, 299–310 (2004)CrossRefGoogle Scholar
  5. 5.
    Tamames, J., Gonzalez-Moreno, M., Valencia, A., Vicente, M.: Bringing gene order into bacterial shape. Trends Genet. 3, 124–126 (2001)CrossRefGoogle Scholar
  6. 6.
    Tamames, J.: Evolution of gene order conservation in prokaryotes. Genome. Biol. 6, 0020.1–0020.11 (2001)Google Scholar
  7. 7.
    Blanchette, M., Kunisawa, T., Sankoff, D.: Gene order breakpoint evidence in animal mitochondrial phylogeny. J. Mol. Evol. 49, 193–203 (1999)Google Scholar
  8. 8.
    Cosner, M.E., Jansen, R.K., Moret, B.M.E., Raubeson, L.A., Wang, L.S., Warnow, T., Wyman, S.: An empirical comparison of phylogenetic methods on chloroplast gene order data in Campanulaceae. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 99–121. Kluwer Academic Press, Dordrecht (2000)Google Scholar
  9. 9.
    Hannenhalli, S., Chappey, C., Koonin, E.V., Pevzner, P.A.: Genome sequence comparison and scenarios for gene rearrangements: A test case. Genomics 30, 299–311 (1995)CrossRefGoogle Scholar
  10. 10.
    Sankoff, D., Bryant, D., Deneault, M., Lang, B.F., Burger, G.: Early eukaryote evolution based on mitochondrial gene order breakpoints. J. Comput. Biol. 3–4, 521–535 (2000)CrossRefGoogle Scholar
  11. 11.
    Sankoff, D., Deneault, M., Bryant, D., Lemieux, C., Turmel, M.: Chloroplast gene order and the divergence of plants and algae from the normalized number of induced breakpoints. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 89–98. Kluwer Academic Press, Dordrecht (2000)Google Scholar
  12. 12.
    Chen, X., Su, Z., Dam, P., Palenik, B., Xu, Y., Jiang, T.: Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. Nucleic Acids Res. 32, 2147–2157 (2004)CrossRefGoogle Scholar
  13. 13.
    Tamames, J., Casari, G., Ouzounis, C., Valencia, A.: Conserved clusters of functionally related genes in two bacterial genomes. J. Mol. Evol. 44: 66–73 (1997)CrossRefGoogle Scholar
  14. 14.
    Seoighe, C.: Turning the clock back on ancient genome duplication. Curr. Opin. Genet. Dev. 13, 636–643 (2003)CrossRefGoogle Scholar
  15. 15.
    Wolfe, K.: Yesterday’s polyploids and the mystery of diploidization. Nature Rev. Genet. 2, 33–41 (2001)CrossRefGoogle Scholar
  16. 16.
    Endo, T., Imanishi, T., Gojobori, T., Inoko, H.: Evolutionary significance of intra-genome duplications on human chromosomes. Gene. 205, 19–27 (1997)CrossRefGoogle Scholar
  17. 17.
    Hughes, A.L.: Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. Mol. Biol. Evol. 15, 854–870 (1998)Google Scholar
  18. 18.
    Kasahara, M.: New insights into the genomic organization and origin of the major histocompatibility complex: role of chromosomal (genome) duplication in the emergence of the adaptive immune system. Hereditas 127, 59–65 (1997)Google Scholar
  19. 19.
    Katsanis, N., Fitzgibbon, J., Fisher, E.: Paralogy mapping: identification of a region in the human MHC triplicated onto human chromosomes 1 and 9 allows the prediction and isolation of novel PBX and NOTCH loci. Genomics 35, 101–108 (1996)Google Scholar
  20. 20.
    Smith, N.G.C., Knight, R., Hurst, L.D.: Vertebrate genome evolution: a slow shuffle or a big bang. BioEssays 21, 697–703 (1999)Google Scholar
  21. 21.
    Trachtulec, Z., Forejt, J.: Synteny of orthologous genes conserved in mammals, snake, fly, nematode, and fission yeast. Mamm. Genome. 3, 227–231 (2001)Google Scholar
  22. 22.
    Amores, A., Force, A.I., Yan, Y., Joly, L., Amemiya, C., Fritz, A., Ho, R., Langeland, J., Prince, V., Wang, Y.L., Westerfield, M., Ekker, M., Postlethwait, J.H.: Zebrafish hox clusters and vertebrate genome evolution. Science 282, 1711–1714 (1998)CrossRefGoogle Scholar
  23. 23.
    Spring, J.: Genome duplication strikes back. Nature Genetics 31, 128–129 (2002)Google Scholar
  24. 24.
    Coulier, F., Pontarotti, P., Roubin, R., Hartung, H., Goldfarb, M., Birnbaum, D.: Of worms and men: An evolutionary perspective on the fibroblast growth factor (FGF) and FGF receptor families. J. Mol. Evol. 44, 43–56 (1997)Google Scholar
  25. 25.
    Lipovich, L., Lynch, E.D., Lee, M.K., King, M.C.: A novel sodium bicarbonate cotransporter-like gene in an ancient duplicated region: SLC4A9 at 5q31. Genome. Biol. 2, 0011.1–0011.13 (2001)Google Scholar
  26. 26.
    Lundin, L.G.: Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse. Genomics 16, 1–19 (1993)Google Scholar
  27. 27.
    Pebusque, M.J., Coulier, F., Birnbaum, D., Pontarotti, P.: Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution. Mol. Biol. Evol. 15, 1145–1159 (1998)Google Scholar
  28. 28.
    Ruvinsky, I., Silver, L.M.: Newly indentified paralogous groups on mouse chromosomes 5 and 11 reveal the age of a T-box cluster duplication. Genomics 40, 262–266 (1997)Google Scholar
  29. 29.
    Calabrese, P.P., Chakravarty, S., Vision, T.J.: Fast identification and statistical evaluation of segmental homologies in comparative maps. ISMB (Supplement of Bioinformatics), 74–80 (2003)Google Scholar
  30. 30.
    Danchin, E.G.J., Abi-Rached, L., Gilles, A., Pontarotti, P.: Conservation of the MHC-like region throughout evolution. Immunogenetics 55, 141–148 (2003)Google Scholar
  31. 31.
    Durand, D., Sankoff, D.: Tests for gene clustering. Journal of Computational Biology, 453–482 (2003)Google Scholar
  32. 32.
    Ehrlich, J., Sankoff, D., Nadeau, J.: Synteny conservation and chromosome rearrangements during mammalian evolution. Genetics 147, 289–296 (1997)Google Scholar
  33. 33.
    Venter, J.C., et al.: The sequence of the human genome. Science 291, 1304–1351 (2001)Google Scholar
  34. 34.
    Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000)Google Scholar
  35. 35.
    Bansal, A.K.: An automated comparative analysis of 17 complete microbial genomes. Bioinformatics 15, 900–908 (1999)Google Scholar
  36. 36.
    Bergeron, A., Corteel, S., Raffinot, M.: The algorithmic of gene teams. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 464–476. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  37. 37.
    Ponting, C.P., Schultz, J., Copley, R.R., Andrade, M.A., Bork, P.: Evolution of domain families. Adv. Protein Chem. 54, 185–244 (2000)Google Scholar
  38. 38.
    Goldberg, L.A., Goldberg, P.W., Paterson, M.S., Pevzner, P., Sahinalp, S.C., Sweedyk, E.: The complexity of gene placement. Journal of Algorithms 41, 225–243 (2001)Google Scholar
  39. 39.
    Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 254–265. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  40. 40.
    Heber, S., Stoye, J.: Finding all common intervals of k permutations. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 207–218. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  41. 41.
    Nadeau, J., Sankoff, D.: Counting on comparative maps. Trends Genet. 14, 495–501 (1998)CrossRefGoogle Scholar
  42. 42.
    O’Brien, S.J., Wienberg, J., Lyons, L.A.: Comparative genomics: lessons from cats. Trends Genet. 10, 393–399 (1997)Google Scholar
  43. 43.
    Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. U. S. A. 96, 2896–2901 (1999)Google Scholar
  44. 44.
    Wolfe, K.H., Shields, D.C.: Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713 (1997)Google Scholar
  45. 45.
    Hoberman, R., Durand, D.: Incompatible desiderata of gene cluster properties. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678. Springer, Heidelberg (2005) (in press)CrossRefGoogle Scholar
  46. 46.
    Hoberman, R., Sankoff, D., Durand, D.: The statistical analysis of spatially clustered genes under the maximum gap criterion. Journal of Computational Biology (2005) (in press)Google Scholar
  47. 47.
    Li, Q., Lee, B.T.K., Zhang, L.: Genome-scale analysis of positional clustering of mouse testis-specific genes. BMC Genomics 6,  7 (2005)Google Scholar
  48. 48.
    Adams, M.D., et al.: The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000)Google Scholar
  49. 49.
    Huynen, M.A., Bork, P.: Measuring genome evolution. PNAS 95, 5849–5856 (1998)Google Scholar
  50. 50.
    Tatusov, R.L., Koonin, E.V., Lipman, D.: A genomic perspective on protein families. Science 278, 631–637 (1997)Google Scholar
  51. 51.
    Manning, G., Whyte, D.B., Martinez, R., Hunter, T., Sudarsanam, S.: The protein kinase complement of the human genome. Science 298, 1912–1934 (2002)Google Scholar
  52. 52.
    Polya, G.: Notes on introductory combinatorics. Birkhauser, Basel (1983)zbMATHGoogle Scholar
  53. 53.
    Abi-Rached, L., Gilles, A., Shiina, T., Pontarotti, P., Inoko, H.: Evidence of en bloc duplication in vertebrate genomes. Nat. Genet. 31, 100–105 (2002)Google Scholar
  54. 54.
    Ohno, S.: Evolution by genome duplication. Springer, Berlin (1970)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Narayanan Raghupathy
    • 1
  • Dannie Durand
    • 2
  1. 1.Department of Biological SciencesCarnegie Mellon UniversityPittsburghUSA
  2. 2.Departments of Biological Sciences and Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations