Syntactical Annotation of COMPARA: Workflow and First Results

  • Susana Inácio
  • Diana Santos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)


In this paper we present the annotation of COMPARA, currently the largest parallel corpora which includes Portuguese. We describe the motivation, give a glimpse of the results so far, and the way the corpus is being annotated, as well as mention some studies based on it.


Machine Translation Word Sense Disambiguation Language Engineering Parallel Corpus Syntactical Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Frankenberg-Garcia, A., Santos, D.: Introducing COMPARA, the Portuguese- English parallel translation corpus. In: Zanettin, F., Bernardini, S., Stewart, D. (eds.) Corpora in Translation Education, pp. 71–87. St. Jerome Publishing, Manchester (2003)Google Scholar
  2. 2.
    Bick, E.: The Parsing System Palavras: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press (2000)Google Scholar
  3. 3.
    Santos, D., Bick, E.: Providing Internet access to Portuguese corpora: the AC/DC project. In: Gavriladou, et al. (eds.) Proceedings of the Second International Conference on Language Resources and Evaluation, LREC 2000, Athens, May 31-June 2, pp. 205–210 (2000)Google Scholar
  4. 4.
    Santos, D., Inácio, S.: Annotating COMPARA, a grammar-aware parallel corpus. In: Proceedings of LREC 2006, Genoa, Italy (May 2006)Google Scholar
  5. 5.
    Santos, D.: DISPARA, a system for distributing parallel corpora on the Web. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 209–218. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Aston, G., Burnard, L.: The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh University Press, Edinburgh (1996)Google Scholar
  7. 7.
    Inácio, S., Santos, D.: Documentação da anotação da parte portuguesa do COMPARA. In progress. First version (December 9, 2005),
  8. 8.
    Rayson, P., Garside, R.: The CLAWS Web Tagger. ICAME Journal 22. HITcentre - Norwegian Computing Centre for the Humanities, 121–123Google Scholar
  9. 9.
    Santos, D.: Breves explorações num mar de língua. Ilha do Desterro (2006)Google Scholar
  10. 10.
    Medeiros, J.C., Marques, R., Santos, D.: Português Quantitativo, Actas do 1.º Encontro de Processamento de Língua Portuguesa (Escrita e Falada) - EPLP 1993 (Lisboa), pp. 33-38 (February 25–26, 1993)Google Scholar
  11. 11.
    Santos, D.: Português Computacional. In: Duarte, I., Leiria, I. (orgs.) Actas do Congresso Internacional sobre o português, Junho de 1996, vol. III, pp. 167–184. Edições Colibri / APL, Lisboa (1994)Google Scholar
  12. 12.
    Santos, D., Costa, L., Rocha, P.: Cooperatively evaluating Portuguese morphology. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.d.G.V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 259–266. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Costa, L., Rocha, P., Santos, D.: Organização e resultados morfolímpicos. In: D. Santos (ed.), Avaliação conjunta: um novo paradigma no processamento computacional da língua portuguesa. No preloGoogle Scholar
  14. 14.
    Santos, Diana: Translation-based corpus studies: Contrasting English and Portuguese tense and aspect systems. Rodopi, Amsterdam/New York (2004)Google Scholar
  15. 15.
    Afonso, S., Bick, E., Haber, R., Santos, D.: Floresta sintá(c)tica: a treebank for Portuguese. In: Rodríguez, M.G., Araujo, C.P.S. (eds.) Proceedings of LREC 2002, ELRA 2002, Las Palmas, May 29-31, pp. 1698–1703 (2002)Google Scholar
  16. 16.
    Afonso, S.: Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica, (In progress, First version) (2004)
  17. 17.
    Sampson, G.: The role of taxonomy in language engineering. Philosophical Transactions of the Royal Society (Mathematical, Physical and Engineering Sciences) 358(4), 1339–1345 (2000)CrossRefGoogle Scholar
  18. 18.
    Santos, D.: The importance of vagueness in translation: Examples from English to Portuguese. Romansk Forum 5, 43–69 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Susana Inácio
    • 1
  • Diana Santos
    • 1
  1. 1.Linguateca, SINTEF ICTPortugal

Personalised recommendations