Advertisement

Named Entity Recognition for the Indonesian Language: Combining Contextual, Morphological and Part-of-Speech Features into a Knowledge Engineering Approach

  • Indra Budi
  • Stéphane Bressan
  • Gatot Wahyudi
  • Zainal A. Hasibuan
  • Bobby A. A. Nazief
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3735)

Abstract

We present a novel named entity recognition approach for the Indonesian language. We call the new method InNER for Indonesian Named Entity Recognition. InNER is based on a set of rules capturing the contextual, morphological, and part of speech knowledge necessary in the process of recognizing named entities in Indonesian texts. The InNER strategy is one of knowledge engineering: the domain and language specific rules are designed by expert knowledge engineers. After showing in our previous work that mined association rules can effectively recognize named entities and outperform maximum entropy methods, we needed to evaluate the potential for improvement to the rule based approach when expert crafted knowledge is used. The results are conclusive: the InNER method yields recall and precision of up to 63.43% and 71.84%, respectively. Thus, it significantly outperforms not only maximum entropy methods but also the association rule based method we had previously designed.

Keywords

Association Rule Contextual Feature Mine Association Rule Maximum Entropy Method Entity Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Appelt, D., Israel, D.J.: Introduction to Information Extraction Technology, Tutorial at IJCAI 1999, Stockholm, Sweden (1999)Google Scholar
  2. 2.
    Appelt, D., et al.: SRI International FASTUS system MUC-6 test results and analysis. In: Proceedings of the 6th Message Understanding Conference (MUC-6) (1995)Google Scholar
  3. 3.
    Bikel, D., et al.: NYMBLE: A High Performance Learning Name-Finder. In: Proceeding of the fifth Conference on Applied Natural Language Processing, pp. 194-201 (1997)Google Scholar
  4. 4.
    Borthwick, A., et al.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, Canada (1998)Google Scholar
  5. 5.
    Bressan, S., Indradjaja, L.: Part-of-speech tagging without training. In: Aagesen, F.A., Anutariya, C., Wuwongse, V. (eds.) INTELLCOMM 2004. LNCS, vol. 3283, pp. 112–119. Springer, Heidelberg (2004) ISSN: 0302-9743, ISBN: 3-540-23893-XCrossRefGoogle Scholar
  6. 6.
    Budi, I., Bressan, S.: Association Rules Mining for Name Entity Recognition. In: Proceeding of 4th Web Information System Engineering (WISE) Conference, Roma (2003)Google Scholar
  7. 7.
    Chieu, H.L., Ng, H.T.: Named Entity Recognition: A Maximum Entropy Approach Using Global Information. In: Proceedings of the 19th International Conference on Computational Linguistics (2002)Google Scholar
  8. 8.
    Chinchor, N., et al.: Named Entity Recognition Task Definition Version 1.4, The MITRE Corporation and SAIC (1999)Google Scholar
  9. 9.
    Dalianis, H., Åström, E.: SweNam-A Swedish Named Entity recognizer. Its construction, training and evaluation, Technical report, TRITA-NA-P0113, IPLab-189, NADA, KTH (2001)Google Scholar
  10. 10.
    Dekang, L.: Using Collocation Statistics in Information Extraction. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)Google Scholar
  11. 11.
    Douthat, A.: The Message Understanding Conference Scoring Software User’s Manual. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)Google Scholar
  12. 12.
    Farmakiotou, D., Karkaletsis, V., Koutsias, K., Sigletos, G., Spyropoulos, C.D., Stamatopoulos, P.: Rule-based Named Entity Recognition for Greek Financial Texts. In: Proceedings of the International Conference on Computational Lexicography and Multimedia Dictionaries COMLEX 2000 (2000)Google Scholar
  13. 13.
    Grishman, R.: Information Extraction: Techniques and Challenges. In: Pazienza, M.T. (ed.) SCIE 1997. LNCS, vol. 1299. Springer, Heidelberg (1997)Google Scholar
  14. 14.
    Iwanska, L., et al.: Wayne State University:Description of the UNO natural language processing system as used for MUC-6. In: Proceedings of the 6th Message Understanding Conference (MUC-6) (1995)Google Scholar
  15. 15.
    Mikheev, A., Grover, C., Moen, M.: Description of the LTG System Used for MUC-7. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)Google Scholar
  16. 16.
    Morgan, R., et al.: Description of the LOLITA system as used for MUC-6. In: Proceedings of the 6th Message Understanding Conference (MUC-6) (1995)Google Scholar
  17. 17.
    Savitri, S.: Analisa Struktur Kalimat Bahasa Indonesia dengan Menggunakan Pengurai Kalimat berbasis Linguistic String Analysis, final project report, Fasilkom UI, Depok (1999) (in Indonesian)Google Scholar
  18. 18.
    Sekine, S., Grishman, R., Shinnou, H.: A Decision Tree Method for Finding and Classifying Names in Japanese Texts. In: Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, Canada (1998)Google Scholar
  19. 19.
    Tur, G., Hakkani-Tur, D.Z., Oflazer, K.: Name Tagging Using Lexical, Contextual, and Morphological Information. In: Workshop on Information Extraction Meets Corpus Linguistics LREC 2000, 2nd International Conf. Language Resources and Evaluation, Athens, Greece (2000)Google Scholar
  20. 20.
    Wahyudi, G.: Pengenalan Entitas Bernama berdasarkan Informasi Kontekstual, Morfologi dan Kelas Kata, final project report, Fasilkom UI, Depok (2004) (in Indonesian)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Indra Budi
    • 1
  • Stéphane Bressan
    • 2
  • Gatot Wahyudi
    • 1
  • Zainal A. Hasibuan
    • 1
  • Bobby A. A. Nazief
    • 1
  1. 1.Faculty of Computer ScienceUniversity of Indonesia 
  2. 2.School of ComputingNational University of Singapore 

Personalised recommendations