Adaptation of Data and Models for Probabilistic Parsing of Portuguese

  • Benjamin Wing
  • Jason Baldridge
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)


We present the first results for recovering word-word dependencies from a probabilistic parser for Portuguese trained on and evaluated against human annotated syntactic analyses. We use the Floresta Sintá(c)tica with the Bikel multi-lingual parsing engine and evaluate performance on both PARSEVAL and unlabeled dependencies. We explore several configurations, both in terms of parameterizing the parser and in terms of enhancements to the trees used for training the parser. Our best configuration achieves 80.6% dependency accuracy on unseen test material, well above adjacency baselines and on par with previous results for unlabeled dependencies.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Collins, M., Hajic, J., Ramshaw, L., Tillmann, C.: A statistical parser for Czech. In: Proc. of the 37th ACL, College Park, Maryland, USA (1999)Google Scholar
  2. 2.
    Hajic, J.: Building a syntactically annotated corpus: Prague dependency treebank. In: Issues of Valency and Meaning, Karolinum, Prague, pp. 106–132 (1998)Google Scholar
  3. 3.
    Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proc. of the 35th Annual Meeting of the ACL, Madrid, Spain, pp. 16–23 (1997)Google Scholar
  4. 4.
    Dubey, A., Keller, F.: Probabilistic parsing for German using sister-head dependencies. In: Proc. of the 41st ACL, pp. 96–103 (2003)Google Scholar
  5. 5.
    Dubey, A.: What to do when lexicalization fails: Parsing German with suffix analysis and smoothing. In: Proc. of the 43rd ACL, Ann Arbor, MI, pp. 314– 321 (2005)Google Scholar
  6. 6.
    Arun, A., Keller, F.: Lexicalization in crosslinguistic probabilistic parsing: The case of French. In: Proc. of the 43rd ACL, Ann Arbor, MI, USA, pp. 306–313 (2005)Google Scholar
  7. 7.
    de Carvalho e Sousa, F.: Analisador sintático estatístico orientado ao núcleo-léxico para a língua portuguesa. Master’s thesis, Instituto de Matemática e Estatística da Universidade de São Paulo (2003)Google Scholar
  8. 8.
    Collins, M.: Head-driven statistical models for natural language parsing. Computational Linguistics 29(4), 589–638 (2003)CrossRefGoogle Scholar
  9. 9.
    Bonfante, A.G., das Graças Nunes, M.: The implementation process of a statistical parser for Brasilian Portuguese. In: Proc. of the IWPT 2001 (2001)Google Scholar
  10. 10.
    Bonfante, A.G.: Parsing Probabilístico para o Português do Brasil. PhD thesis, Instituto de Ciências Matemáticas e de Computação da Universidade de São Paulo (2003)Google Scholar
  11. 11.
    Afonso, S.: Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica (2005)Google Scholar
  12. 12.
    Afonso, S., Bick, E., Haber, R., Santos, D.: Floresta sintá(c)tica: A treebank for Portuguese. In: Araujo, M.G.R.C.P.S. (ed.) Proc. of LREC 2002, Las Palmas de Gran Canaria, Spain, pp. 1698–1703 (2002)Google Scholar
  13. 13.
    Bikel, D.: Design of a multi-lingual, parallel-processing statistical parsing engine. In: Proc. of the 2nd International Conference on Human Language Technology Research, San Francisco (2002)Google Scholar
  14. 14.
    Bikel, D.: Intricacies of Collins’ parsing model. Computational Linguistics 30(4), 479–511 (2004)CrossRefGoogle Scholar
  15. 15.
    Bick, E.: The Parsing System PALAVRAS, Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press, Aarhus (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Benjamin Wing
    • 1
  • Jason Baldridge
    • 1
  1. 1.Department of LinguisticsUniversity of Texas at AustinAustinUSA

Personalised recommendations