Exploring Predicate-Argument Relations for Named Entity Recognition in the Molecular Biology Domain

  • Tuangthong Wattarujeekrit
  • Nigel Collier
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3735)


In this paper, the semantic relationships between a predicate and its arguments in terms of semantic roles are employed to improve lexical-based named entity recognition (NER) in the molecular biology domain. The semantic roles were realized in various sets of syntactic features used by a machine learning model to explore what should be the efficient way in allowing this knowledge to provide the highest positive effect on the NER. The empirical results show that the best feature set consists of predicate’s surface form, predicate’s lemma, voice, and the united feature of subject-object head’s lemma and transitive-intransitive sense. The performance improvement from using these features indicates the advantage of the predicate-argument semantic knowledge on NER. There are still rooms to enhance NER by using this semantic knowledge (e.g. to employ other semantic roles besides agent and theme and to extend the rules for efficient identification of an argument’s boundary).


Surface Object Semantic Relationship Semantic Role Name Entity Recognition Passive Voice 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    DARPA. The 6th Message Understanding Conference. Columbia, Maryland (1995)Google Scholar
  2. 2.
    Stapley, B.J., Benoit, G.: Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in Medline abstracts. In: Pac. Symp. Biocomp., pp. 529–540 (2000)Google Scholar
  3. 3.
    Willett, R.: Recent trends in hierarchic document clustering: a critical review. Information Processing & Management 25, 577 (1998)Google Scholar
  4. 4.
    Ohta, T., Tateishi, Y., Kim, J.D.: The GENIA corpus: An annotated research abstract corpus in the molecular biology domain. HLT (2002)Google Scholar
  5. 5.
    Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pac. Symp. Biocomp, pp. 707–718 (1998)Google Scholar
  6. 6.
    Spasic, I., Nenadic, G., Ananiadou, S.: Using domain-Specific Verbs for Term Classification. In: The ACL Workshop on NLP in Biomed., pp. 17–24 (2003)Google Scholar
  7. 7.
    Takeuchi, K., Collier, N.: Use of Support Vector Machines in Extended Named Entity Recognition. In: CONLL, pp. 119–125 (2002)Google Scholar
  8. 8.
    Zhou, G., Su, J.: Exploring Deep Knowledge Resources in Biomedical Name Recognition. In: The Joint Workshop on NLP in Biomed. and its App (JNLPBA), pp. 84–87 (2004)Google Scholar
  9. 9.
    Kim, J.D., Ohta, T., Tsuruoka, Y., Tateishi, Y., Collier, N.: Introduction to the Bio-Entity Task at JNLPBA, pp. 70–75 (2004)Google Scholar
  10. 10.
    Collier, N., Nobata, C., Tsujii, J.: Extracting the names of genes and gene products with a Hidden Markov Model. In: COLING, pp. 201–207 (2000)Google Scholar
  11. 11.
    Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning Support Vector Machines for Biomedical Named Entity Recognition. In: The ACL Workshop on NLP in Biomed, pp. 1–8 (2002)Google Scholar
  12. 12.
    Lee, K.J., Hwang, Y.S., Rim, H.C.: Two-phase biomedical NE Recognition based on SVMs. In: The ACL Workshop on NLP in Biomed, pp. 33–40 (2003)Google Scholar
  13. 13.
    Vapnix, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1998)Google Scholar
  14. 14.
    Blaschke, C., Andrade, M.A., Ouzounis, C., Valencia, A.: Automatic extraction of biological information from scientific text: Protein-protein interactions. In: The Int. Conf. on Intelligent System Molecular Biology, pp. 60–67 (1999)Google Scholar
  15. 15.
    Ono, T., Hishigaki, H., Tanigami, A., Takagi, T.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinform 17, 155–161 (2001)CrossRefGoogle Scholar
  16. 16.
    Pustejovsky, J., Castano, J., Zhang, J.: Robust Relational parsing over Biomedical Literature: Extracting Inhibit Relations. In: Pac. Symp. Biocomput., pp. 505–516 (2002)Google Scholar
  17. 17.
    Rindflesch, T.C., Rajan, J.V., Hunter, L.: Extracting Molecular Binding Relationships from Biomedical Text. In: ANLP, pp. 188–195 (2000)Google Scholar
  18. 18.
    Wattarujeekrit, T., Shah, P., Collier, N.: PASBio: predicate-argument structures for event extraction in molecular biology. BMC Bioinformatics 5, 155 (2004)CrossRefGoogle Scholar
  19. 19.
    Tapanainen, P., Jarvinen, T.: A non-projective dependency parser. In: ANLP, pp. 64–71 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Tuangthong Wattarujeekrit
    • 1
  • Nigel Collier
    • 1
  1. 1.National Institute of InformaticsTokyoJapan

Personalised recommendations