A Set of NP-Extraction Rules for Portuguese: Defining, Learning and Pruning

  • Claudia Oliveira
  • Maria Claudia Freitas
  • Violeta Quental
  • Cícero Nogueira dos Santos
  • Renato Paes Leme
  • Lucas Souza
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)


This paper presents a set of rules for extracting noun phrases from Portuguese texts. We describe how this set was gradually obtained, starting from a machine learned set of transformation rules that was manually reviewed. The noun phrases extracted by these transformations were given as input to another learner that synthesized rules for breaking up complex noun phrases into simpler ones. The results of these processes applied to a Brazilian Portuguese corpus are evaluated.


Noun Phrase Relative Clause Training Corpus Head Noun Prepositional Phrase 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Yarovsky, D., Church, K. (eds.) Proceedings of the Third Workshop on Very Large Corpora, New Jersey, USA, pp. 82–94. Association for Computational Linguistics (1995)Google Scholar
  2. 2.
    dos Santos, C.N., Oliveira, C.: Aplicação de aprendizado baseado em transformações na identificação de sintagmas nominais. In: Anais do XXV Congresso da Sociedade Brasileira de Compútação, Brazil (2005)Google Scholar
  3. 3.
    Abney, S.: Parsing by chunk. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle- Based Parsing. Kluwer Academic Publishers, Dordrecht (1991)Google Scholar
  4. 4.
    Cardie, C., Pierce, D.: Error-driven pruning of treebank grammars for base nounphrase identification. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, Ithaca-NY, pp. 218–224 (1998)Google Scholar
  5. 5.
    Cardie, C., Pierce, D.: The role of lexicalization and pruning for base noun phrase grammars. In: Proceedings of the 16th National Conference on Artificial Intelligence (1999)Google Scholar
  6. 6.
    Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21, 543–565 (1995)Google Scholar
  7. 7.
    Cardie, C., Wagstaff, K.: Noun phrase coreference as clustering. In: Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, Ithaca-NY, pp. 82–89 (1999)Google Scholar
  8. 8.
    Evans, D., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of the 34th annual meeting on Association for Computational Linguistics, pp. 17–24 (1996)Google Scholar
  9. 9.
    Brill, E., Ngai, G.: Man vs. machine: a case study in base noun phrase learning. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 65–72 (1999)Google Scholar
  10. 10.
    Radford, A.: Transformational Syntax. Cambridge University Press, Cambridge (1981)Google Scholar
  11. 11.
    Mateus, M., Brito, A., Duarte, I., Faria, I.: Gramática da língua portuguesa, 4th edn. Caminho, Lisboa (1994)Google Scholar
  12. 12.
    Marchi, A.R.: Projeto lacio-web: Desafios na construção de um corpus de 1,1 milhão de palavras de textos jornalísticos em português do brasil. In: 51° Seminário do Grupo de Estudos Lingüísticos do Estado de São Paulo, São Paulo, Brasil (2003)Google Scholar
  13. 13.
    Bick, E.: The Parsing System Palavras: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. PhD thesis, Aarhus University (2000)Google Scholar
  14. 14.
    dos Santos, C.N.: Aprendizado de máquina na identificação dos sintagmas nominais: o caso do português brasileiro. Master’s thesis, Instituto Militar de Engenharia, Rio de Janeiro, RJ (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Claudia Oliveira
    • 1
  • Maria Claudia Freitas
    • 2
  • Violeta Quental
    • 2
  • Cícero Nogueira dos Santos
    • 3
  • Renato Paes Leme
    • 1
  • Lucas Souza
    • 2
  1. 1.Departamento de Engenharia de SistemasInstituto Militar de EngenhariaRio de JaneiroBrazil
  2. 2.Departamento de LetrasPontifícia Universidade CatólicaRio de JaneiroBrazil
  3. 3.Departamento de InformáticaPontifícia Universidade CatólicaRio de JaneiroBrazil

Personalised recommendations