Model Parameter Estimation

  • William Turin
Part of the Information Technology: Transmission, Processing and Storage book series (PSTE)


In this chapter, we develop methods for approximating a stochastic process with an HMM and, in particular, for fitting HMMs to experimental data. We present the iterativeexpectation maximization(EM) algorithm for the process approximation which generalizes the statistical EM algorithm and, in particular, theBaum-Welch algorithm(BWA) for approximating the process with an HMM. This generalized algorithm can be also applied to curve fitting and finding a maximum of a nonnegative function of several variables. The EM algorithm is iterative and converges slowly. Therefore, it is important to select a good initial model. We describe several methods for choosing the initial HMM. The question of speeding up the algorithm convergence is also addressed.


Markov Chain Model Parameter Estimation Observation Sequence Interval Distribution Forward Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    T. W. Anderson and L. A. Goodman,Statistical inference about Markov chains, Ann. Math. Statist. 28, 89–110,(1957).MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    M. S. BartlettIntroduction to Stochastic Processes with Special Reference to Methods and Applications, (Cambridge Univ. Press, Cambridge, 1978).zbMATHGoogle Scholar
  3. 3.
    M. S. Bartlett, “The frequency goodness of fit test for probability chains,”Proc. Cambridge Philos. Soc. 4a7, 86–95, (1951).MathSciNetCrossRefGoogle Scholar
  4. 4.
    L. E. Baum, T. Petrie, G. Soules, and N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann. Math. Statist. 41, 164–171, (1970).MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    E. R. Berlekamp Algebraic Coding Theory (McGraw-Hill, New York, 1968).zbMATHGoogle Scholar
  6. 6.
    P. Billingsley, “Statistical methods in Markov chains,” Ann. Math. Statist. 32, 12–40, (1961).MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    P. Billingsley Statistical Inference for Markov Processes (University of Chicago Press, Chicago, 1961).zbMATHGoogle Scholar
  8. 8.
    R. E. Blahut Fast Algorithms for Digital Signal Processing (Addison-Wesley, Reading, Massachusetts,1984).Google Scholar
  9. 9.
    R. E. Blahut Theory and Practice of Error Control Codes (Addison-Wesley, Reading, Massachusetts, 1983).zbMATHGoogle Scholar
  10. 10.
    R. P. Brent, F. G. Gustayson, and D. Y. Y. Yun, “Fast solution of Toeplitz systems of equations and computation of Pade approximants,” J. Algorithms 1 259–295, (1980).MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    M. B. Brilliant, “Observations of errors and error rates on TI digital repeatered lines,” Bell Syst. Tech. J. 57 (3), 711–746, March (1978).Google Scholar
  12. 12.
    C. Chatfield, “Statistical inference regarding Markov chain models,”Appl. Statist. 22, 7–20, (1973).CrossRefGoogle Scholar
  13. 13.
    E. CinlarIntroduction to Stochastic Processes, Prentice Hall, (Englewood Cliffs, New Jersey, 1975).Google Scholar
  14. 14.
    H. Cramer Mathematical Methods of Statistics (Princeton Univ. Press, Princeton, New Jersey, 1946).zbMATHGoogle Scholar
  15. 15.
    A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Statist. Soc. 76, 341–353, (1977).Google Scholar
  16. 16.
    S. W. Dharmadhikary, “Functions of finite Markov chains,” Ann. Math. Stat. 34 (3), 1022–1032, (1963).CrossRefGoogle Scholar
  17. 17.
    S. W. Dharmadhikary, “Sufficient conditions for a stationary process to be a function of a finite Markov chain” Ann. Math. Stat. 34 (3), 1033–1041, (1963).CrossRefGoogle Scholar
  18. 18.
    I. V. Dunin-Barkovsky and K. V. Smimov Probability Theory and Mathematical Statistics (in Russian), (Gostechizdat, Moscow, 1955).Google Scholar
  19. 19.
    J. Durbin, “The fitting of time-series models,” Rev. Inst. Int. Stat. 28 (3), 233–243, (1960).zbMATHCrossRefGoogle Scholar
  20. 20.
    R. V. Ericson, “Functions of Markov chains,” Ann. Math. Stat. 41 (3), 843–850, (1970).MathSciNetCrossRefGoogle Scholar
  21. 21.
    W. Feller, AnIntroduction to Probability Theory and Its Applications 1(John Wiley & Sons, 1962).Google Scholar
  22. 22.
    R. Fortret, “Random determinants,” J. Res. Nat. Bureau Standards 47, 465–470, (1951).CrossRefGoogle Scholar
  23. 23.
    E. Frank, “Corresponding type continued fractions,” Amer. Jour. of Math. 68, 89–108, (1946).zbMATHCrossRefGoogle Scholar
  24. 24.
    F. R. Gantmacher The Theory of Matrices (Chelsea Publishing Co., New York, 1959).zbMATHGoogle Scholar
  25. 25.
    V. L. Girco Random Matrices (in Russian), (Vischa Shcola Publishers, Kiev, 1975).Google Scholar
  26. 26.
    U. Grenander and G. Szego Toeplitz Forms and Their Applications (University of California Press, Berkeley, 1958).zbMATHGoogle Scholar
  27. 27.
    P. G. Hod, “A test for Markoff chains,” Biometrica 41, 430–433, (1954).Google Scholar
  28. 28.
    T. Kailath, “A view of three decades of linear filtering theory,” IEEE Trans. Inform. Theory, IT-20(2), 146–181, (1974).MathSciNetzbMATHCrossRefGoogle Scholar
  29. 29.
    J. Komlós, “On the Determinant of Random Matrices,” Studia Sci. Math. Hungary, 3 (4), (1968).Google Scholar
  30. 30.
    S. Kullback Information Theory and Statistics (John Wiley & Sons, New York, 1959).zbMATHGoogle Scholar
  31. 31.
    W. Leighton and W. T. Scott, “A general continued fraction expansion,” Bull. Am. Math. Soc. 45, 596–605, (1935).MathSciNetCrossRefGoogle Scholar
  32. 32.
    N. Levinson, “The Wiener RMS (Root Mean Square) error criterion in filter design and prediction,” J. Math Phys. 25 (4), 261–278, 1947) (Also Appendix B in N. Wiener, Extrapolation, Interpolation and Smoothing of Stationary Time Series, MIT Press, Cambridge, MA, 1964)MathSciNetGoogle Scholar
  33. 33.
    F. W. Leysieffer, “Functions of finite Markov chains,” Ann. Math. Stat. vol. 38 (1), 206–212, (1967).MathSciNetzbMATHCrossRefGoogle Scholar
  34. 34.
    T. A. Louis, “Finding the observed information matrix when using the EM algorithm,” J. R. Statist. Soc. 44,226–233,(1982).MathSciNetzbMATHGoogle Scholar
  35. 35.
    J. Makhoul, “Linear prediction, a tutorial review,” Proc. IEEE 63 (4), 561–580, April (1975).CrossRefGoogle Scholar
  36. 36.
    H. B. Mann and A. Wald, “On the statistical treatment of linear stochastic difference equations,” Econometrica 11 (3), 173–220, (1943).MathSciNetzbMATHCrossRefGoogle Scholar
  37. 37.
    J. L. Massey, “Shift register synthesis and BCH decoding,” IEEE Trans. Inform. Theory, IT-15, 122–127, (1969).MathSciNetCrossRefGoogle Scholar
  38. 38.
    W. H. Mills, “Continued fractions and linear recurrences,” Math. of Comp. 29 (129), 173–180, (1975).zbMATHCrossRefGoogle Scholar
  39. 39.
    C. Moller and C. Van Loan, “Nineteen dubious ways to compute the exponential of a matrix,” SIAM Review 20, 801–836, (1978).MathSciNetCrossRefGoogle Scholar
  40. 40.
    M. F. Neuts Matrix-Geometric Solutions in Stochastic Models (Johns Hopkins, Baltimore, 1981).zbMATHGoogle Scholar
  41. 41.
    H. Nyquist, S. Rice, and J. Riordan, “The distribution of random determinants,” Quart. Appl. Math., 12 (2), (1954).Google Scholar
  42. 42.
    H. Padé, “Sur la Representation Aprochee d’une Fonction par des Fractions Rationelles,” Ann. Ecole Norm. supp., 3 (9), 1–93 1892Google Scholar
  43. 43.
    W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery Numerical Recipes in C (Cambridge University Press, Cambridge, 1992).zbMATHGoogle Scholar
  44. 44.
    R. De Prony, “Essai experimentale et analytique,” J. Ecole Polytechnique 24–76, (1795).Google Scholar
  45. 45.
    L. Rabiner and B.-H. JuangFundamentals of Speech Recognition(Prentice Hall, Englewood Cliffs, New Jersey, 1993).Google Scholar
  46. 46.
    C. R. Rao, Linear Statistical Inference and Its Applications, (John Wiley & Sons, New York, 1965).zbMATHGoogle Scholar
  47. 47.
    M. Sajadieh, F.R. Kschischang, and A. Leon-Garcia, “A block memory model for correlated Rayleigh fading channels,” roc. IEEE Int. Conf. Commun., 282–286, June (1996).Google Scholar
  48. 48.
    S. Sivaprakasam and K. S. Shanmugan, “A forward-only recursion based HMM for modeling burst errors in digital channels,” GLOBECOM, 2, 1054–1058, (1995).Google Scholar
  49. 49.
    S. Sivaprakasam and K. S. Shanmugan, “An equivalent Markov model for burst errors in digital channels,” IEEE Trans. Commun. 43, 1347–1355, (1995).zbMATHCrossRefGoogle Scholar
  50. 52.
    Yu. A. Shreider Method of Statistical Testing (Elsevier Publishing Co., Amsterdam, 1964).zbMATHGoogle Scholar
  51. 51.
    Y. Sugiyama, M. Kasahara, S. Hirsawa, and T. Namekawa, “A method for solving key equations for decoding Goppa codes,” Inform. Control 27 87–99, (1975).zbMATHCrossRefGoogle Scholar
  52. 52.
    Y. Sugiyama, “An algorithm for solving discrete-time Wiener-Hopf equations based upon Euclid’salgorithm ” IEEE Trans. Inform. Theory, IT-32 394–409, (1986).MathSciNetCrossRefGoogle Scholar
  53. 53.
    G. Szegö, “Orthogonal Polynomials,” Colloquium Publications(23), Amer. Math. Society, (1939).Google Scholar
  54. 54.
    N. Tan Adaptive Channel/Code Matching, Ph. D. Dissertation (University of Southern California,November, 1993).Google Scholar
  55. 55.
    W. Turin and M. M. Sondhi, “Modeling error sources in digital channels,” IEEE Journ. Sel. Areas in Commun. 11 (3)340–347, (1993).CrossRefGoogle Scholar
  56. 56.
    W. Turin, “Fitting probabilistic automata via the EM algorithm,” Stock Models 12 (3)405–424, (1996).MathSciNetzbMATHCrossRefGoogle Scholar
  57. 57.
    W. Turin and R. A. Boie, “Bar code recovery via the EM algorithm,” IEEE Trans. Sig. Proc., 46.(2), Feb. (1998).Google Scholar
  58. 58.
    J. L. Walsh Interpolation and Approximation by Rational Functions in the Complex Domain Amer. Math. Soc.20 354–363 (1969).Google Scholar
  59. 59.
    J. K. Wolf, “Decoding of Bose-Chaudhuri-Hocquenghem codes and Prony’s method of curve fitting,” IEEE Trans. Inform. Theory, IT-13, 608, Oct. (1967).CrossRefGoogle Scholar
  60. 60.
    C. F. J. Wu, “On the convergence properties of the EM algorithm,” Ann. of Statistics 11 (1), 95–103 (1983).zbMATHCrossRefGoogle Scholar
  61. 61.
    W. I. Zangwill, Nonlinear Programming: A Unified Approach, (Prentice Hall, Englewood Cliffs, New Jersey (1969).zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2004

Authors and Affiliations

  • William Turin
    • 1
  1. 1.AT&T Labs—ResearchFlorham ParkNew JerseyUSA

Personalised recommendations