Automatic Extraction of Keywords for the Portuguese Language

  • Maria Abadia Lacerda Dias
  • Marcelo de Gomensoro Malheiros
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)


This paper outlines the adaptation of an algorithm for automatic extraction of keywords for the Portuguese Language. Keywords make possible to summarize the contents of documents in a compact form, and may also be used as an efficient measure of similarity between texts. This work is focused on the extraction of keywords for theses on several fields of knowledge. To identify the keywords the KEA algorithm was used, together with a stemming technique specific to Portuguese and a manually created list of stopwords. It is shown that the results obtained are good enough for practical use and similarly match what have been done for the English Language.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cunha, C., Cintra, L.F.L.: Nova Gramática do Português Contemporâneo, 3rd edn. Nova Fronteira, Rio de Janeiro (2001)Google Scholar
  2. 2.
    Dias, M.A.L.: Automatic Extraction of Keywords for the Portuguese Language Applied to Theses in the Engineering Field. Master thesis (in Portuguese, to be published)Google Scholar
  3. 3.
    Orengo, V.M., Huyck, C.R.: A Stemming Algorithim for The Portuguese Language. In: Proceedings of the SPIRE Conference. Laguna de San Raphael: [s.n.] (2001)Google Scholar
  4. 4.
    Witten, I.H., et al.: KEA: Practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries. [S.l.]: [s.n.] (1999)Google Scholar
  5. 5.
  6. 6.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Maria Abadia Lacerda Dias
    • 1
  • Marcelo de Gomensoro Malheiros
    • 2
  1. 1.UNICAMP – State University of CampinasCampinasBrazil
  2. 2.UNIVATES University CenterLajeadoBrazil

Personalised recommendations