Haplotype Association Analysis

  • Michael P. EpsteinEmail author
  • Lydia C. Kwee


Haplotypes serve many useful roles in the design and implementation of genetic studies of complex traits. In this chapter, we focus on the use of haplotypes as variables of interest for detecting association between a genomic region and a complex trait. Such haplotype analyses are appealing because, in certain instances, they can be more powerful for association mapping compared to traditionalmethods based on the analysis of individual SNPs. At the same time, haplotype analyses are more complicated to implement than single-SNP analyses since the sample genetic data often consist of unphased genotypes (which often lead to haplotype ambiguity). However, statisticians have developed many innovative methods for haplotype analysis that accommodate such haplotype ambiguity using existing missing-data algorithms. In this section, we describe a variety of such statistical methods for haplotype mapping, which are applicable to genetic datasets collected under traditional population-based and family-based study designs. We further describe software packages that are publicly available for implementing these haplotype approaches. Finally, we illustrate many of these statistical methods and related software packages using unphased genotype data from the Finland-United States Investigation of NIDDM Genetics (FUSION) study.


Haplotype Analysis Haplotype Pair Haplotype Effect Environment Interaction Effect Haplotype Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



We thank the FUSION study investigators for allowing us to present results from the analysis of FUSION data. We also thank Dr. Glen Satten for his comments on a previous version of this chapter. This work was supported by National Institutes of Health grants HG003618 and GM074909.


  1. 1.
    Akey J, Jin L, Xiong M (2001) Haplotypes vs. Single-marker linkage disequilibrium tests: what do we gain? Eur J Hum Genet 9:291–300CrossRefPubMedGoogle Scholar
  2. 2.
    Allen AS, Satten GA (2007) Inference on haplotype/disease association using parent-affected-child data: the projection conditional on parental haplotypes method. Genet Epidemiol 31:211–223CrossRefPubMedGoogle Scholar
  3. 3.
    Allen AS, Satten GA, Tsiatis AA (2005) Locally efficient robust estimation of haplotype-disease association in family-based studies. Biometrika 92:559–571CrossRefGoogle Scholar
  4. 4.
    Bourgain C, Genin E, Quesneville H, Clerget-Darpoux F (2000) Search for multifactorial disease susceptibility genes in founder populations. Ann Hum Genet 64:255–265CrossRefPubMedGoogle Scholar
  5. 5.
    Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74:106–120CrossRefPubMedGoogle Scholar
  6. 6.
    Carroll RJ, Wang S, Wang CY (1995) Prospective analysis of logistic case–control studies. J Am Stat Assoc 90:157–169CrossRefGoogle Scholar
  7. 7.
    Chen H-S, Zhu X, Zhao H, Zhang S (2003) Qualitative semi-parametric test to detect genetic association in case–control design under structured population. Ann Human Genetics 67: 250–264CrossRefGoogle Scholar
  8. 8.
    Clayton D (1999) A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. Am J Hum Genet 65:1170–1177CrossRefPubMedGoogle Scholar
  9. 9.
    Clayton D, Jones H (1999) Transmission/disequilibrium tests for extended marker haplotypes. Am J Hum Genet 65:1161–1169CrossRefPubMedGoogle Scholar
  10. 10.
    Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc B 34:187–220Google Scholar
  11. 11.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation from incomplete data via the EM algorithm. J R Stat Soc 39:1–38Google Scholar
  12. 12.
    Devlin B and Risch N (1995) A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29:311–322CrossRefPubMedGoogle Scholar
  13. 13.
    Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004CrossRefPubMedGoogle Scholar
  14. 14.
    Douglas JA, Boehnke M, Gillanders E, Trent JM, Gruber SB (2001) Experimentally derived haplotypes substantially increase the efficiency of linkage disequilibrium studies. Nat Genet 28:361–364CrossRefPubMedGoogle Scholar
  15. 15.
    Dudbridge F (2003) Pedigree disequilibrium tests for multilocus haplotypes. Genet Epidemiol 25:115–121CrossRefPubMedGoogle Scholar
  16. 16.
    Durrant C, Zondervan KT, Cardon LR, Hunt S, Deloukas P, Morris AP (2004) Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. Am J Hum Genet 75:35–43CrossRefPubMedGoogle Scholar
  17. 17.
    Eitan Y, Kashi Y (2002) Direct micro-haplotyping by multiple double PCR amplifications of specific alleles (MD-PASA). Nucleic Acids Res 30:e62CrossRefPubMedGoogle Scholar
  18. 18.
    Epstein MP, Satten GA (2003) Inference on haplotype effects in case-control studies using unphased genotype data. Am J Hum Genet 73:1316–1329CrossRefPubMedGoogle Scholar
  19. 19.
    Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12:921–927PubMedGoogle Scholar
  20. 20.
    Fallin D, Cohen A, Essioux L, Chumakov I, Blumenfeld M, Cohen D, Schork N (2001) Genetic analysis of case/control data using estimated haplotype frequencies: application to APOE variation and Alzheimer’s disease. Genome Res 11:143–151CrossRefPubMedGoogle Scholar
  21. 21.
    Ghosh S, Watanabe RM, Valle TT, Hauser ER, Magnuson VL, Langefeld CD, Ally DS, Mohlke KL, Silander K, Kohtamäki K, Chines P, Balow J, Birznieks G, Chang J, Eldridge W, Erdos MR, Karanjawala ZE, Knapp JI, Kudelko K, Martin C, Morales-Mena A, Musick A, Musick T, Pfahl C, Porter R, Rayman JB, Rha D, Segal L, Shapiro S, Sharaf R, Shurtleff B, So A, Tannenbaum J, Te C, Tover J, Unni A, Welch C, Whiten R, Witt A, Blaschak-Harvan J, Douglas JA, Duren WL, Epstein MP, Fingerlin TE, Kaleta HS, Lange EM, Li C, McEachin RC, Stringham HM, Trager E, White PP, Eriksson J, Toivanen L, Vidgren G, Nylund SJ, Tuomilehto-Wolf E, Ross EH, Demirchyan E, Hagopian WA, Buchanan TA, Tuomilehto J, Bergman RN, Collins FS, Boehnke M (2000) The Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics (FUSION) Study. I. An autosomal genome scan for genes that predispose to Type 2 diabetes. Am J Hum Genet 67:1174–1185PubMedGoogle Scholar
  22. 22.
    Horvath S, Xu X, Laird NM (2001) The family based association test method: strategies for studying general genotype-phenotype associations. Eur J Hum Genet 9:301–306CrossRefPubMedGoogle Scholar
  23. 23.
    Horvath S, Xu X, Lake SL, Silverman EK, Weiss ST, Laird NM (2004) Family-based tests for associating haplotypes with general phenotype data: application to asthma genetics. Genet Epidemiol 26:61–69CrossRefPubMedGoogle Scholar
  24. 24.
    Huang BE, Amos CI, Lin DY (2007) Detecting haplotype effects in genomewide association studies. Genet Epidemiol 31:803–812CrossRefPubMedGoogle Scholar
  25. 25.
    Joosten PH, Toepoel M, Mariman EC, Van Zoelen EJ (2001) Promoter haplotype combinations of the platelet-derived growth factor alpha-receptor gene predispose to human neural tube defects. Nat Genet 27:215–217CrossRefPubMedGoogle Scholar
  26. 26.
    Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J (2005) Complement factor H polymorphism in age-related macular degeneration. Science 308:385–389CrossRefPubMedGoogle Scholar
  27. 27.
    Kraft P, Cox DG, Paynter RA, Hunter D, De Vivo I (2005) Accounting for haplotype uncertainty in matched association studies: a comparison of simple and flexible techniques. Genet Epidemiol 28:261–272CrossRefPubMedGoogle Scholar
  28. 28.
    Kwee LC, Epstein MP, Manatunga AK, Duncan R, Allen AS, Satten GA (2007) Simple methods for assessing haplotype–environment interactions in case-only and case–control studies. Genet Epidemiol 31:75–90CrossRefPubMedGoogle Scholar
  29. 29.
    Lake SL, Lyon H, Tantisira K, Silverman EK, Weiss ST, Laird NM, Schaid DJ (2003) Estimation and tests of haplotype–environment interaction when linkage phase is ambiguous. Hum Hered 55:56–65CrossRefPubMedGoogle Scholar
  30. 30.
    Lewinger JP, Bull SB (2006) Validity, efficiency, and robustness of a family-based test of association. Genet Epidemiol 30:62–76CrossRefPubMedGoogle Scholar
  31. 31.
    Lin DY (2004) Haplotype-based association analysis in cohort studies of unrelated individuals. Genet Epidemiol 26:255–264CrossRefPubMedGoogle Scholar
  32. 32.
    Lin DY, Zeng D (2006) Likelihood-based inference on haplotype effects in genetic association studies. J Am Stat Assoc 101:89–104CrossRefGoogle Scholar
  33. 33.
    Lin DY, Zeng D, Millikan R (2005) Maximum-likelihood estimation of haplotype effects and haplotype-environment interactions in association studies. Genet Epidemiol 29:299–312CrossRefPubMedGoogle Scholar
  34. 34.
    Liu N, Beerman I, Lifton R, Zhao H (2006) Haplotype analysis in the presence of informatively missing genotype data. Genet Epidemiol 30:290–300CrossRefPubMedGoogle Scholar
  35. 35.
    Louis T (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc B 44:226–233Google Scholar
  36. 36.
    McCullagh P, Nelder JA (1989) Generalized linear models. Chapman and Hall, LondonGoogle Scholar
  37. 37.
    McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New YorkGoogle Scholar
  38. 38.
    Michalatos-Beloin S, Tishkoff S, Bentley K, Kidd K, Ruano G (1996) Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR. Nucleic Acids Res 24:4841–4843CrossRefPubMedGoogle Scholar
  39. 39.
    Molitor J, Majoram P, Thomas DC (2003) Fine-scale mapping of disease genes with multiple mutations via spatial clustering techniques. Am J Hum Genet 73:1368–1384CrossRefPubMedGoogle Scholar
  40. 40.
    Morris RW, Kaplan NL (2002) On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles. Genet Epidemiol 23:221–233CrossRefPubMedGoogle Scholar
  41. 41.
    Niu T, Qin ZS, Xu X, Liu JS (2002) Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am J Hum Genet 70:157–169CrossRefPubMedGoogle Scholar
  42. 42.
    Piegorsch WW, Weinberg CR, Taylor JA (1994) Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Stat Med 13:153–162.CrossRefPubMedGoogle Scholar
  43. 43.
    Prentice RL, Pyke R (1979) Logistic disease incidence models and case–control studies. Biometrika 66:403–412CrossRefGoogle Scholar
  44. 44.
    Pritchard JK, Stephens M, Rosenberg NA, Donnelly P (2000). Association mapping in structured populations. Am J Hum Genet 67:170–181CrossRefPubMedGoogle Scholar
  45. 45.
    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC (2007a) PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet 81: 559–575CrossRefPubMedGoogle Scholar
  46. 46.
    Purcell S, Daly MJ, Cham PC (2007b) WHAP: haplotype-based association analysis. Bioinformatics, 23:255–256CrossRefPubMedGoogle Scholar
  47. 47.
    Rabinowitz D (2002) Adjusting for population heterogeneity and misspecified haplotype frequencies when testing non-parametric null hypotheses in statistical genetics. J Am Stat Assoc 97:742–751CrossRefGoogle Scholar
  48. 48.
    Rabinowitz D (2003) Adjusting for population heterogeneity: a framework for characterizing statistical information and developing efficient test statistics. Genet Epidemiol 24:284–290.CrossRefPubMedGoogle Scholar
  49. 49.
    Rabinowitz D, Laird N (2000) A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered 504:227–233Google Scholar
  50. 50.
    Rosenberg PS, Che A, Chen BE (2006) Multiple hypothesis testing strategies for genetic case-control association studies. Stat Med 25:3134–3149CrossRefPubMedGoogle Scholar
  51. 51.
    Satten GA, Epstein MP (2004) Comparison of prospective and retrospective methods for haplotype inference in case–control studies. Genet Epidemiol 27:192–201CrossRefPubMedGoogle Scholar
  52. 52.
    Satten GA, Kupper LL (1993) Inferences about exposure-disease associations using probability-of-exposure information. J Am Stat Assoc 88:200–208CrossRefGoogle Scholar
  53. 53.
    Schaid DJ (1995) Relative-risk regression models using cases and their parents. Genet Epidemiol 12:813–818CrossRefPubMedGoogle Scholar
  54. 54.
    Schaid DJ (2004) Evaluating associations of haplotypes with traits. Genet Epidemiol 27:348–364CrossRefPubMedGoogle Scholar
  55. 55.
    Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA (2002) Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 70:425–434CrossRefPubMedGoogle Scholar
  56. 56.
    Scott AJ, Wild CJ (1997) Fitting regression models to case–control data by maximum likelihood. Biometrika 84:57–71CrossRefGoogle Scholar
  57. 57.
    Self SG, Longton G, Kopecky KJ, Liang KY (1991) On estimating HLA/disease association with application to a study of aplastic anemia. Biometrics 47:53–61CrossRefPubMedGoogle Scholar
  58. 58.
    Seltman H, Roeder K, Devlin B (2003) Evolutionary-based association analysis using haplotype data. Genet Epidemiol 25:48–58CrossRefPubMedGoogle Scholar
  59. 59.
    Silander K, Scott LJ, Valle TT, Mohlke KL, Stringham HM, Wiles KR, Duren WL, Doheny KF, Pugh EW, Chines P, Narisu N, White PP, Fingerlin TE, Jackson AU, Li C, Ghosh S, Magnuson VL, Colby K, Erdos MR, Hill JE, Hollstein P, Humphreys KM, Kasad RA, Lambert J, Lazaridis KN, Lin G, Morales-Mena A, Patzkowski K, Pfahl C, Porter R, Rha D, Segal L, Suh YD, Tovar J, Unni A, Welch C, Douglas JD, Epstein MP, Hauser ER, Hagopian W, Buchanan TA, Watanabe RM, Bergman RN, Tuomilehto J, Collins FS, Boehnke M (2004) A large set of Finnish affected sibling pair families with type 2 diabetes suggests susceptibility loci on chromosomes 6, 11, and 14. Diabetes 53:821–829CrossRefPubMedGoogle Scholar
  60. 60.
    Spinka C, Carroll RJ, Chatterjee N (2005) Analysis of case–control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity. Genet Epidemiol 29:108–127CrossRefPubMedGoogle Scholar
  61. 61.
    Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978–989CrossRefPubMedGoogle Scholar
  62. 62.
    Stram DO (2005) Tag SNP selection for association studies. Genet Epidemiol 27:365–374CrossRefGoogle Scholar
  63. 63.
    Stram DO, Haiman CA, Hirschhorn JN, Altshuler D, Kolonel LN, Henderson BE, Pike ML (2003a) Choosing haplotype-tagging SNPs based on unphased genotype data from a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum Hered 55:27–36CrossRefPubMedGoogle Scholar
  64. 64.
    Tavtigian S, Simard J, Teng D, Abtin V, Baumgard M, Beck A, Camp J, et al. (2001) A candidate prostate cancer susceptibility gene at chromosome 17p. Nat Genet 27:172–180CrossRefPubMedGoogle Scholar
  65. 65.
    Tzeng JY (2005) Evolutionary-based grouping of haplotypes in association analysis. Genet Epidemiol 28:220–231CrossRefPubMedGoogle Scholar
  66. 66.
    Tzeng J-Y, Devlin B, Wasserman L, Roeder K (2003) On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. Am J Hum Genet 72:891–902CrossRefPubMedGoogle Scholar
  67. 67.
    Tzeng JY, Wang CH, Kao JT, Hsiao CK (2006) Regression-based association analysis with clustered haplotypes through use of genotypes. Am J Hum Genet 78:231–242CrossRefPubMedGoogle Scholar
  68. 68.
    Valle T, Tuomilehto J, Bergman RN, Ghosh S, Hauser ER, Eriksson J, Nylund SJ, Kohtamaki K, Toivanen L, Vidgren G, Tuomilehto-Wolf E, Ehnholm C, Blaschak J, Langefeld CD, Watanabe RM, Magnuson V, Ally DS, Hagopian WA, Ross E, Buchanan TA, Collins F, Boehnke M (1998) Mapping genes for NIDDM: design of the Finland-United States Investigation of NIDDM Genetics (FUSION) study. Diabetes Care 21:949–958CrossRefPubMedGoogle Scholar
  69. 69.
    Van der Meulen MA, te Meerman GJ (1997) Association and haplotype sharing due to identity by descent with an application to genetic mapping. In: Edwards JH, Pawlowitzki IH, Thompson E (eds) Genetic mapping of disease genes. Academic Press, London, pp. 115–135Google Scholar
  70. 70.
    Watanabe RM, Ghosh S, Langefeld CD, Valle T, Hauser ER, Magnuson VL, Mohlke KL, Silander K, Ally DS, Chines P, Blaschak-Harvan J, Douglas JA, Duren WL, Epstein MP, Fingerlin TE, Kaleta HS, Lange EM, Li C, McEachin RC, Stringham HM, Trager E, White PP, Balow J, Birznieks G, Chang J, Eldridge W, Erdos MR, Karanjawala ZE, Knapp JI, Kudelko K, Martin C, Morales-Mena A, Musick A, Musick T, Pfahl C, Porter R, Rayman JB, Rha D, Segal L, Shapiro S, Sharaf R, Shurtleff B, So A, Tannenbaum J, Te C, Tovar J, Unni A, Welch C, Whiten R, Witt A, Kohtamaki K, Ehnholm C, Eriksson J, Toivanen L, Vidgren G, Nylund SJ, Tuomilehto-Wolf E, Ross EH, Demirchyan E, Hagopian WA, Buchanan TA, Tuomilehto J, Bergman RN, Collins FS, Boehnke M (2000) The Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics (FUSION) Study. II. An autosomal genome scan for diabetes-related quantitative-trait loci. Am J Hum Genet 67:1186–1200Google Scholar
  71. 71.
    Weinberg CR (2003) Studying parents and grandparents to assess genetic contributions to early-onset disease. Am J Hum Genet 72:438–447CrossRefPubMedGoogle Scholar
  72. 72.
    Whittemore AS (2004) Estimating genetic association parameters from family data. Biometrika 91:219–225CrossRefGoogle Scholar
  73. 73.
    Yang Q, Khoury MJ, Flanders WD (1997) Sample size requirements in case-only designs to detect gene-environment interaction. Am J Epidemiol 146:713–720PubMedGoogle Scholar
  74. 74.
    Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered 53:79–91CrossRefPubMedGoogle Scholar
  75. 75.
    Zeng D, Lin DY, Avery CL, North KE, Bray MS (2006) Efficient semiparametric estimation of haplotype-disease associations in case-cohort and nested case-control studies. Biostatistics 7:486–502CrossRefPubMedGoogle Scholar
  76. 76.
    Zhang S, Pakstis A, Kidd K, Zhao H (2001) Comparisons of two methods for haplotype reconstruction and haplotype frequency estimation from population data, Am J Hum Genet 69:906-912CrossRefPubMedGoogle Scholar
  77. 77.
    Zhang H, Zheng G, Li Z (2006) Statistical analysis for haplotype-based matched case-control studies. Biometrics 62:1124–1131CrossRefPubMedGoogle Scholar
  78. 78.
    Zhang H, Zhang H, Li Z, Zheng G (in press)Statistical methods for haplotype-based matched case-control association studies. Genet EpidemiolGoogle Scholar
  79. 79.
    Zhao H, Zhang S, Merikangas KR, Trixler M, Wildenauer DB, Sun F, Kidd KK. 2000. Transmission/disequilibrium tests using multiple tightly linked markers. Am J Hum Genet 67:936–946CrossRefPubMedGoogle Scholar
  80. 80.
    Zhao JH, Curtis D, Sham PC (2000) Model-free analysis and permutation tests for allelic associations. Hum Hered 50:133–139CrossRefPubMedGoogle Scholar
  81. 81.
    Zhao LP, Li SS, Khalid N (2003) A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case-control studies. Am J Hum Genet 72:1231–1250CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  1. 1.Department of Human GeneticsEmory University School of MedicineAtlantaUSA
  2. 2.Department of BiostatisticsEmory UniversityAtlantaUSA

Personalised recommendations