Using Natural Alignment to Extract Translation Equivalents

  • Pablo Gamallo Otero
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)


Most methods to extract bilingual lexicons from parallel corpora learn word correspondences using relative small aligned segments, called sentences. Then, they need to get a corpus aligned at the sentence level. Such an alignment can require further manual corrections if the parallel corpus contains insertions, deletions, or fuzzy sentence boundaries. This paper shows that it is possible to extract bilingual lexicons without aligning parallel texts at the sentence level. We describe a method to learn word translations from a very roughly aligned corpus, namely a corpus with quite long segments separated by “natural boundaries”. The results obtained using this method are very close to those obtained using sentence alignment. Some experiments were performed on English-Portuguese and English-Spanish parallel texts.


Machine Translation Word Type Computational Linguistics Sentence Level Parallel Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ahrenberg, L., Andersson, M., Merkel, M.: A simple hybrid aligner for generating lexical correspondences in parallel texts. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, pp. 29–35 (1998)Google Scholar
  2. 2.
    Brown, P.F., Lai, J., Mercer, R.: Aligning sentences in parallel corpora. In: 29th Conference of ACL (1991)Google Scholar
  3. 3.
    Church, K.: Char_align: A program for aligning parallel texts at the character level. In: 31st Conference of the Association for Computational Linguistics (ACL), Columbus, Ohio, pp. 1–8 (1993)Google Scholar
  4. 4.
    Fung, P., McKeown, K.: Finding terminology translation frmo nonparallel corpora. In: 5th Annual Workshop on Very Large Corpora, Hong Kong, pp. 192–202 (1997)Google Scholar
  5. 5.
    Gale, W., Church, K.: Identifying word correspondences in parallel texts. In: Workshop DARPA SNL (1991)Google Scholar
  6. 6.
    Gamallo, P.: Extraction of translation equivalents from parallel corpora using sense-sensitive contexts. In: 10th Conference of the European Association on Machine Translation (EAMT 2005), Budapest, Hungary, pp. 97–102 (2005)Google Scholar
  7. 7.
    Koehn, P.: Europarl: A multilingual corpus for evaluation of machine translation (2003),
  8. 8.
    Kwong, O.Y., Tsou, B.K., Lai, T.B.: Alignment and extraction of bilingual legal terminology from context profiles. Terminology 10(1), 81–99 (2004)CrossRefGoogle Scholar
  9. 9.
    Melamed, D.: A word-to-word model of translational equivalence. In: 35th Conference of the Association of Computational Linguistics (ACL 1997), Madrid, Spain (1997)Google Scholar
  10. 10.
    Melamed, D.: Bitext maps and alignment via pattern recognition. Computational Linguistics 25(1) (1999)Google Scholar
  11. 11.
    Ribeiro, A., Dias, G., Lopes, G., Mexia, J.: Cognates alignment. In: Machine Translation Summit VIII, Santiago de Compostela, Spain, pp. 287–293 (2001)Google Scholar
  12. 12.
    Ribeiro, A., Lopes, G., Mexia, J.: Using confidence bands for parallel texts alignment. In: 38th Conference of the Association for Computational Linguistics (ACL), pp. 432–439 (2000)Google Scholar
  13. 13.
    Simard, M., Plamondon, P.: Bilingual sentence alignment: Balancing robustness and accuracy. Machine Translation 13(1), 59–80 (1998)CrossRefGoogle Scholar
  14. 14.
    Smadja, F., McKeown, K., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons. Computational Linguistics 22(1) (1996)Google Scholar
  15. 15.
    Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: 11th Nordic Conference of Computational Linguistics, Copenhagen, Denmark (1998)Google Scholar
  16. 16.
    Vintar, Ŝ.: Using parallel corpora for translation-oriented term extraction. Babel Journal 47(2), 121–132 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Pablo Gamallo Otero
    • 1
  1. 1.Departamento de Língua Espanhola, Faculdade de FilologiaUniversidade de Santiago de CompostelaGalizaSpain

Personalised recommendations