Advertisement

Word Length and Frequency Distributions in Different Text Genres

  • Gordana Antić
  • Ernst Stadlober
  • Peter Grzybek
  • Emmerich Kelih
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

In this paper we study word length frequency distributions of a systematic selection of 80 Slovenian texts (private letters, journalistic texts, poems and cooking recipes). The adequacy of four two-parametric Poisson models is analyzed according their goodness of fit properties, and the corresponding model parameter ranges are checked for their suitability to discriminate the text sorts given. As a result we obtain that the Singh-Poisson distribution seems to be the best choice for both problems: first, it is an appropriate model for three of the text sorts (private letters, journalistic texts and poems); and second, the parameter space of the model can be split into regions constituting all four text sorts.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ANTIĆ, G., KELIH, E.; GRZYBEK, P. (2005): Zero-syllable Words in Determining Word Length. In: P. Grzybek (Ed.): Contributions to the science of language. Word Length Studies and Related Issues. Kluwer, Dordrecht, 117–157.Google Scholar
  2. BEST, K.-H. (Ed.) (1997): The distribution of Word and Sentence Length. WVT, Trier. [= Glottometrika; 16]Google Scholar
  3. GRZYBEK, P. (Ed.) (2005): Contributions to the Science of Language. Word Length Studies and Related Issues. Kluwer, Dordrecht.Google Scholar
  4. GRZYBEK, P., STADLOBER, E., KELIH, E., and ANTIĆ, G. (2005): Quantitative Text Typology: The Impact of Word Length. In: C. Weihs and W. GAUL (Eds.), Classification — The Ubiquitous Challenge. Springer, Heidelberg; 53–64.Google Scholar
  5. KELIH, E., ANTIĆ, G., GRZYBEK, P. and STADLOBER, E. (2005): Classification of Author and/or Genre? The Impact of Word Length. In: C. Weihs and W. GAUL (Eds.), Classification — The Ubiquitous Challenge. Springer, Heidelberg; 498–505.Google Scholar
  6. WIMMER, G., and ALTMANN, G. (1999): Thesaurus of univariate discrete probability distributions. Essen.Google Scholar

Copyright information

© Springer Berlin · Heidelberg 2006

Authors and Affiliations

  • Gordana Antić
    • 1
  • Ernst Stadlober
    • 1
  • Peter Grzybek
    • 2
  • Emmerich Kelih
    • 2
  1. 1.Department of StatisticsGraz University of TechnologyGrazAustria
  2. 2.Department for Slavic StudiesGraz UniversityGrazAustria

Personalised recommendations